Energizing Change: Electric Vehicle Rise in Switzerland
Author
Pierre Vernet, Alexis Jeanneret, Hugo Troendle, Urs Hurni
Published
December 12, 2023
1 Introduction
1.1 Background and Motivation
In an era marked by unprecedented environmental challenges, our world is at a critical juncture where sustainable practices are no longer an option but a necessity. Climate change, driven by anthropogenic activities, poses an imminent threat to the delicate ecological balance. Recognizing the urgency of this global issue, we are compelled to investigate the specific context of Switzerland, the country we live in, and which has a lot of financial capacities to change rapidly.
The automotive sector plays a pivotal role in shaping the a sustainable world, and our focus on electric vehicle adoption in Switzerland reflects a strategic choice to comprehend the intricate interplay of factors driving this transition.
The motivations underlying our research stem from a multifaceted perspective. Firstly, the automotive industry is undergoing a transformative shift globally, with electric vehicles emerging as a key solution to reduce carbon emissions. By narrowing our focus to Switzerland, we aim to provide nuanced insights into the factors influencing EV adoption, offering a unique perspective on the socio-economic, environmentally stable and financially comfortable Swiss context. The adoption of electric vehicles provides an insight into people’s inclination to make efforts towards sustainability. This emphasizes the crucial role that electric vehicle adoption plays in evaluating society’s commitment to sustainable practices.
Secondly, our commitment to environmental sustainability extends beyond a generic interest. The urgency of addressing climate change has never been more palpable, and our research seeks to contribute to the discussion on sustainable mobility.
Moreover, we believe the future is encapsulated in our dedication to understanding the dynamics of environmentally conscious consumer behavior. By unraveling the complexities of EV adoption in Switzerland over the years (2005-2022), we aim to offer valuable insights that could guide policymakers, businesses, and consumers towards more informed and sustainable choices.
We believe that by comprehensively analyzing the impact of external factors on the adoption of electric vehicles in Switzerland, we can contribute meaningfully to the ongoing global efforts towards a more sustainable and resilient future. Through this project, we aspire to inspire positive change, foster innovation, and advocate for a paradigm shift towards eco-friendly practices in the automotive sector and beyond.
The initial source contains a comprehensive overview of electric vehicles (EVs) in Switzerland, which gave us further incentive to dive deeper into the topic.The second article presents essential statistical data relating to publicly accessible recharging infrastructure for electric mobility in Switzerland.
1.3 Reseach Questions
Based on past electric vehicle adoption trends in Switzerland, can we forecast future adoption rates and pinpoint times of significant increases or decreases correlated with major events or policy changes?
In comparing regions in Switzerland, which areas show higher or lower adoption of electric vehicles, and how does this regional adoption align or vary with external factors like oil price changes, political opinions, and demographic shifts?
How has the growth of electric vehicles evolved in comparison to other countries such as France, and what factors might account for the differences in their evolution ?
To what extent does the evolution in the availability of charging stations exert an influence on the adoption of electric vehicles in Switzerland?
2 Data
2.1 Raw Datasets
Use a Loop for Similar Files.
Check for Large Data: If any of our datasets are particularly large (like the swiss_vehicle), we used a more efficient data reading librariy data.table (with fread) to speed up the data loading process.
We also added a common file path prefix (“../data/”) to make the code cleaner and easier to change for any usage in the future.
For this phase of the project, our approach involves presenting diverse datasets in a table form. To achieve this, we used the kable function from the kableExtra package, resulting in a table that displays each variable along with its corresponding definition and also added some bootstrap_otions for different stlyles.
2.1.1 Swiss Vehicle Registration Dataset
This data set explains the new registrations of road vehicles by vehicle group, canton, vehicle type, fuel, month and year in Switzerland. This data set is essential to evaluate the EVs’ evolution within our chosen location.
Code
# Create a tibble with variable descriptionsvariable_table_vehicle <-tibble(Variable =c("Canton", "Vehicle Group / Type", "Fuel", "Month", "2009-2022"),Description =c("The region in Switzerland.","Type or group of the vehicle.","Type of fuel used by the vehicle.","The month of the data.","Number of vehicles for each respective year. Each with its own column" ))# Display the table using kableExtravariable_table_vehicle %>%kbl() %>%kable_styling(position ="center", bootstrap_options =c("striped", "bordered", "hover"))
Variable
Description
Canton
The region in Switzerland.
Vehicle Group / Type
Type or group of the vehicle.
Fuel
Type of fuel used by the vehicle.
Month
The month of the data.
2009-2022
Number of vehicles for each respective year. Each with its own column
This data set explains the evolution of the price of brent oil over time. From this data set, we will be able to obtain precious information to compute our linear regressions concerning oil price and EVs adoptions.
Code
# Create a tibble with variable descriptionsvariable_table <-tibble(Variable =c("Date", "Price"),Description =c("The date when the price was recorded. In a day-month-year format.","The price of Brent Oil on the given date." ))# Display the table using kableExtravariable_table %>%kbl() %>%kable_styling(position ="center", bootstrap_options =c("striped", "bordered", "hover", "condensed"))
Variable
Description
Date
The date when the price was recorded. In a day-month-year format.
While this dataset contains a substantial amount of information, only a fraction of it will be pertinent to our research. It explains the evolution of the Swiss population over time through variables given below.
Code
# Create a tibble with variable descriptionsvariable_table <-tibble(Variable =c("Year", "Citizenship (category)", "Sex", "Age", "Population on 1 January", "Live birth", "Death", "Natural change", "Immigration incl. change of population type", "Emigration", "Net migration incl. change of population type", "Change of population type", "Acquisition of Swiss citizenship", "Gender change in the civil register (entry)", "Gender change in the civil register (exit)", "Statistical adjustment", "Population on 31 December", "Population change" ),Description =c("The year of the demographic data.","The categorization of the citizenship status.","Gender category.","Age category.","Population count at the beginning of the year.","Number of births in the year.","Number of deaths in the year.","Change in the population due to births and deaths.","Number of immigrants, including change of population type.","Number of emigrations.","Net migration count, including change of population type.","Change in the categorization of the population.","Number of individuals who acquired Swiss citizenship.","Number of gender changes registered (entry).","Number of gender changes registered (exit).","Adjustments made to the data for accuracy.","Population count at the end of the year.","Change in population over the year." ))# Display the table using kableExtravariable_table %>%kbl() %>%kable_styling(position ="center", bootstrap_options =c("striped", "bordered", "hover", "condensed"))
Variable
Description
Year
The year of the demographic data.
Citizenship (category)
The categorization of the citizenship status.
Sex
Gender category.
Age
Age category.
Population on 1 January
Population count at the beginning of the year.
Live birth
Number of births in the year.
Death
Number of deaths in the year.
Natural change
Change in the population due to births and deaths.
Immigration incl. change of population type
Number of immigrants, including change of population type.
Emigration
Number of emigrations.
Net migration incl. change of population type
Net migration count, including change of population type.
Change of population type
Change in the categorization of the population.
Acquisition of Swiss citizenship
Number of individuals who acquired Swiss citizenship.
This Data set shows the evolution of Google trends on 8 different terms. We chose these terms in all 3 different main Swiss national languages (French, Italian and German) to make sure we get the most accurate data possible. Those variables will help us highlight potential tendencies/trends.
Code
# Create a tibble with variable descriptions for Google Trends datavariable_table_google_trend <-tibble(Variable =c("Month", "Search Term: Voiture électrique", "Search Term: Véhicule électrique", "Search Term: EV", "Search Term: Electric Car", "Search Term: Elektromobil", "Search Term: Elektrofahrzeug", "Search Term: Elektro Auto", "Search Term: Auto Elettrica"),Description =c("The month of the data","Google Trends data for 'Voiture électrique: (Suisse)'","Google Trends data for 'Véhicule électrique: (Suisse)'","Google Trends data for 'EV: (Suisse)'","Google Trends data for 'Electric car: (Suisse)'","Google Trends data for 'Elektromobil: (Suisse)'","Google Trends data for 'Elektrofahrzeug: (Suisse)'","Google Trends data for 'Elektro Auto: (Suisse)'","Google Trends data for 'Auto Elettrica: (Suisse)'" ))# Display the table using kableExtravariable_table_google_trend %>%kbl() %>%kable_styling(position ="center", bootstrap_options =c("striped", "bordered", "hover", "condensed"))
Variable
Description
Month
The month of the data
Search Term: Voiture électrique
Google Trends data for 'Voiture électrique: (Suisse)'
Search Term: Véhicule électrique
Google Trends data for 'Véhicule électrique: (Suisse)'
Search Term: EV
Google Trends data for 'EV: (Suisse)'
Search Term: Electric Car
Google Trends data for 'Electric car: (Suisse)'
Search Term: Elektromobil
Google Trends data for 'Elektromobil: (Suisse)'
Search Term: Elektrofahrzeug
Google Trends data for 'Elektrofahrzeug: (Suisse)'
We decided to add this data set to help us compare with another country. Indeed, France is an adjacent country which might imply potential similarities in the outcome. This data set is made of variables such as the Date of the French vehicle registration or whether it is from a professional user or not. The information we are interested in concern the number of registered vehicles in the country by propulsion method over time.
Code
# Create a tibble with variable descriptions for vehicle categorization datavariable_table <-tibble(Variable =c("Date", "Ensemble des voitures particulieres", "Professionnel", "Particulier", "Crit'Air"),Description =c("The date, in year format, in which the observations are registered","The sum of vehicle in every groups","Category 'Professional' of vehicle","Category 'Passenger Car' of vehicle","Categorizes vehicles by emissions with from best to worst classes" ))# Display the table using kableExtravariable_table %>%kbl() %>%kable_styling(position ="center", bootstrap_options =c("striped", "bordered", "hover", "condensed"))
Variable
Description
Date
The date, in year format, in which the observations are registered
Ensemble des voitures particulieres
The sum of vehicle in every groups
Professionnel
Category 'Professional' of vehicle
Particulier
Category 'Passenger Car' of vehicle
Crit'Air
Categorizes vehicles by emissions with from best to worst classes
This data set is meant to determine the number of charging stations per canton, per charging power, per month, per plug type in Switzerland since November 2020. It is important to mention that a station can have multiple charging points. These values will help us to obtain more information about the Swiss adaption to EVs and their potential effects on EVs adoption by the Swiss population
Code
# Create a tibble with variable descriptions in English for charging station datavariable_table_charging <-tibble(Variable =c("year","month","stations_CH_count","stations_XY_count","locations_CH_count","locations_XY_count","plugs_CH_count","plugs_XY_count","chargingPower_CH_sum","chargingPower_XY_sum","chargingPower_CH_count","chargingPower_XY_count","chargingPower_10kW_count","chargingPower_21kW_count","chargingPower_42kW_count"),Description =c("Year of the key figures collection.","Month of the key figures collection. These are collected daily and published monthly.","Number of available charging stations in Switzerland. A station can have multiple charging points.","Number of charging stations per canton. A station can have multiple charging points. This attribute is available for all cantons, XY representing the official canton abbreviation.","Number of stations in Switzerland. A station can have multiple charging points.","Number of stations per canton. A station can have multiple charging points. This attribute is available for all cantons, XY representing the official canton abbreviation.","Number of charging plugs in Switzerland.","Number of charging plugs per plug type in Switzerland. This attribute is available for all plug types, XY corresponding to the plug type according to the Open Intercharge Protocol (OICP) version 2.2 (see below).","Total maximum charging power in kilowatts of recharge stations in Switzerland. Only charging points with known power are considered (see chargingPower_CH_count).","Total maximum charging power in kilowatts of recharge stations in a canton. Only charging points with known power are considered (see chargingPower_XY_count).","Number of recharge stations in Switzerland with known maximum charging power and for which the total maximum charging power has been taken into account (chargingPower_CH_sum).","Number of recharge stations per canton with known maximum charging power and for which the total maximum charging power has been taken into account (chargingPower_XY_sum).","Number of recharge stations in Switzerland with known maximum charging power delivering a maximum charging power of 10 kW.","Number of recharge stations in Switzerland with known maximum charging power ranging from over 10 kW to 21 kW.","Number of recharge stations in Switzerland with known maximum charging power ranging from over 21 kW to 42 kW." ))# Display the table using kableExtravariable_table_charging %>%kbl() %>%kable_styling(position ="center", bootstrap_options =c("striped", "bordered", "hover", "condensed"))
Variable
Description
year
Year of the key figures collection.
month
Month of the key figures collection. These are collected daily and published monthly.
stations_CH_count
Number of available charging stations in Switzerland. A station can have multiple charging points.
stations_XY_count
Number of charging stations per canton. A station can have multiple charging points. This attribute is available for all cantons, XY representing the official canton abbreviation.
locations_CH_count
Number of stations in Switzerland. A station can have multiple charging points.
locations_XY_count
Number of stations per canton. A station can have multiple charging points. This attribute is available for all cantons, XY representing the official canton abbreviation.
plugs_CH_count
Number of charging plugs in Switzerland.
plugs_XY_count
Number of charging plugs per plug type in Switzerland. This attribute is available for all plug types, XY corresponding to the plug type according to the Open Intercharge Protocol (OICP) version 2.2 (see below).
chargingPower_CH_sum
Total maximum charging power in kilowatts of recharge stations in Switzerland. Only charging points with known power are considered (see chargingPower_CH_count).
chargingPower_XY_sum
Total maximum charging power in kilowatts of recharge stations in a canton. Only charging points with known power are considered (see chargingPower_XY_count).
chargingPower_CH_count
Number of recharge stations in Switzerland with known maximum charging power and for which the total maximum charging power has been taken into account (chargingPower_CH_sum).
chargingPower_XY_count
Number of recharge stations per canton with known maximum charging power and for which the total maximum charging power has been taken into account (chargingPower_XY_sum).
chargingPower_10kW_count
Number of recharge stations in Switzerland with known maximum charging power delivering a maximum charging power of 10 kW.
chargingPower_21kW_count
Number of recharge stations in Switzerland with known maximum charging power ranging from over 10 kW to 21 kW.
chargingPower_42kW_count
Number of recharge stations in Switzerland with known maximum charging power ranging from over 21 kW to 42 kW.
Source - [IEA.org] (https://www.iea.org/data-and-statistics/data-tools/global-ev-data-explorer) ### 2.2.7 Complementary charging station
Code
# Create a tibble with variable descriptions for the Swiss population data setvariable_charging_station_v2 <-tibble(Variable =c("Region", "Category", "Parameter", "Mode","Powertrain", "Unit","Value"),Description =c("Either Switzerland or France","One category: Historical","One parameter: EV charging point","One mode: EV","Either Publicly available fast, or Publicly availabe slow","One unit: charging point","The total number of available charging point for a given region, year and powertrain" ))# Display the table using kableExtravariable_charging_station_v2 %>%kbl() %>%kable_styling(position ="center", bootstrap_options =c("striped", "bordered", "hover"))
Variable
Description
Region
Either Switzerland or France
Category
One category: Historical
Parameter
One parameter: EV charging point
Mode
One mode: EV
Powertrain
Either Publicly available fast, or Publicly availabe slow
Unit
One unit: charging point
Value
The total number of available charging point for a given region, year and powertrain
# Create a tibble with variable descriptions for the Swiss population data setvariable_table_politic <-tibble(Variable =c("Years 1971-2023", "Canton", "Political parties", "Taux de participation"),Description =c("The years were elections to the Conseil National took place from 1971 to 2023","The name of the canton in Switzerland","The name of all political parties in Switzerland and their strength in a given Canton, (Canton's total = 100%), in a given year","The participation rate to the elections of the Conseil National in a given year" ))# Display the table using kableExtravariable_table_politic %>%kbl() %>%kable_styling(position ="center", bootstrap_options =c("striped", "bordered", "hover"))
Variable
Description
Years 1971-2023
The years were elections to the Conseil National took place from 1971 to 2023
Canton
The name of the canton in Switzerland
Political parties
The name of all political parties in Switzerland and their strength in a given Canton, (Canton's total = 100%), in a given year
Taux de participation
The participation rate to the elections of the Conseil National in a given year
# Create a tibble with variable descriptions for the Swiss population datasetvariable_table_swiss_pop <-tibble(Variable =c("Canton", "Total Population", "Age Groups", "Sex", "Citizenship", "Marital Status", "Typology: Area with urban character"),Description =c("The name of the canton in Switzerland.","Total number of permanent residents in the canton.","Population broken down into age groups: 0–19, 20–64, 65 and over.","Population broken down by sex: Male and Female.","Population broken down by citizenship: Swiss and Foreigner.","Population broken down by marital status: Single, Married, Widowed, Divorced, Unmarried, In a registered partnership, Partnership dissolved.","Typology of the area based on urban character: Urban core, Area influenced by urban cores, Area beyond urban influence." ))# Display the table using kableExtravariable_table_swiss_pop %>%kbl() %>%kable_styling(position ="center", bootstrap_options =c("striped", "bordered", "hover"))
Variable
Description
Canton
The name of the canton in Switzerland.
Total Population
Total number of permanent residents in the canton.
Age Groups
Population broken down into age groups: 0–19, 20–64, 65 and over.
Sex
Population broken down by sex: Male and Female.
Citizenship
Population broken down by citizenship: Swiss and Foreigner.
Marital Status
Population broken down by marital status: Single, Married, Widowed, Divorced, Unmarried, In a registered partnership, Partnership dissolved.
Typology: Area with urban character
Typology of the area based on urban character: Urban core, Area influenced by urban cores, Area beyond urban influence.
Initially, while attempting to format the dataset, we discovered the generation of numerous NAs. Upon delving deeper into the data, we identified two distinct date formats: the first being [DD-MMM-YYYY] and the second [MMM DD, YYYY]. Consequently, the initial date formatting was ineffective. Therefore, we undertook the task of matching the two tables with the same date format.
Another issue we encountered was to present the date in a standardized form across all our dataset to perform time series analyses. For that, we had to change the abbreviated months names (i.e. Jan, Feb, etc.) into numbers.
Code
# Format 1: DD-MMM-YYYY (e.g., 15-Apr-2020)oil_df_1 <- oil_prices_data[1:8360,] %>%separate(Date, into =c("Day", "Month", "Year"), sep ="-") %>%mutate(Date =dmy(paste(Day, Month, Year)))# Format 2: MMM DD, YYYY (e.g., Apr 22, 2020)# Handling the separator with space and commaoil_df_2 <- oil_prices_data[8361:nrow(oil_prices_data),] %>%separate(Date, into =c("Month", "Day", "Year"), sep =" ", extra ="merge") %>%mutate(Day =word(Day, 1), # Extracts just the day partDate =mdy(paste(Month, Day, Year)))# Merge and filter for dates after 2005df_oil <-rbind(oil_df_1, oil_df_2) %>%filter(year(Date) >=2005) %>%select(Date, Price)# Display cleaned datareactable( df_oil, # Use your oil prices dataframecolumns =list(Date =colDef(align ="center", # Center align the date columnminWidth =150 ),Price =colDef(align ="right", # Right align the price columnstyle =function(value) {# Apply conditional formatting based on the price valueif (value >120) { color <-"green"# Green for high prices } elseif (value <30) { color <-"red"# Red for low prices } else { color <-"black"# Default color for other values }list(color = color) } ) ),highlight =TRUE, # Highlight rows on hoverdefaultPageSize =10, # Display 10 rows per pagepaginationType ="numbers", # Use numbers for page navigationsearchable =TRUE, # Make the table searchablesortable =TRUE, # Allow sortingresizable =TRUE# Allow column resizing)
2.2.2 Data Wrangling : Google trend clean
Comprehending Google Trends information involves understanding the contextualization of the provided figures. The given data is standardized which limits the range of information we can obtain from it.
The crucial element for effective narration lies in utilizing the normalized Trends data. This normalization indicates that when observing search interest trends over time, it is interpreted as a ratio relative to all searches conducted on all topics on Google during that specific period and location. Likewise, when assessing regional search interest for a specific topic, it is construed as the search interest for that particular topic in a given region relative to all searches on all topics on Google in that same place and time.
To clean and process eight Google Trends datasets related to different search terms, we created a function process_dataset() that reads each CSV file, removes date indices, excludes the first row, checks for missing values, renames columns, converts types, and standardizes the date format. Then, we iterated through each dataset, merged them based on the “Date” column, summed values for each search term per date, and normalized these values between 1 and 100.
Code
dataset_names <-c("googletrends_auto-elettrica_2005-2022.csv","googletrends_elektro-auto_2005-2022.csv","googletrends_elektrofahrzeug_2005-2022.csv","googletrends_elektromobil_2005-2022.csv","googletrends_eletric-car_2005-2022.csv","googletrends_EV_2005-2022.csv","googletrends_vehicule-electrique_2005-2022.csv","googletrends_voiture-electrique_2005-2022.csv")# Function to process each datasetprocess_dataset <-function(file_path) {# Read the dataset google_trends_data <-read.csv(file_path)# Remove the dates from the index google_trends_data <- google_trends_data |>rownames_to_column(var ="Date")# Remove the first row using slice google_trends_data <-slice(google_trends_data, -1)# Check for NAcat("NA count for", file_path, ": ", sum(is.na(google_trends_data$Date)), "\n")cat("NA count for SearchCount in", file_path, ": ", sum(is.na(google_trends_data$Catégorie...Toutes.catégories)), "\n")# Rename colcolnames(google_trends_data)[2] <-"SearchCount"# Convert to numeric google_trends_data$SearchCount <-as.numeric(google_trends_data$SearchCount)# Convert the column to date with the desired format google_trends_data$Date <-as.Date(paste(google_trends_data$Date, "01", sep ="-"))return(google_trends_data)}# Process each dataset and store in a listprocessed_datasets <-list()for (dataset_name in dataset_names) { file_path_gt <-file.path("../data", dataset_name) processed_dataset <-process_dataset(file_path_gt) processed_datasets[[dataset_name]] <- processed_dataset}#> NA count for ../data/googletrends_auto-elettrica_2005-2022.csv : 0 #> NA count for SearchCount in ../data/googletrends_auto-elettrica_2005-2022.csv : 0 #> NA count for ../data/googletrends_elektro-auto_2005-2022.csv : 0 #> NA count for SearchCount in ../data/googletrends_elektro-auto_2005-2022.csv : 0 #> NA count for ../data/googletrends_elektrofahrzeug_2005-2022.csv : 0 #> NA count for SearchCount in ../data/googletrends_elektrofahrzeug_2005-2022.csv : 0 #> NA count for ../data/googletrends_elektromobil_2005-2022.csv : 0 #> NA count for SearchCount in ../data/googletrends_elektromobil_2005-2022.csv : 0 #> NA count for ../data/googletrends_eletric-car_2005-2022.csv : 0 #> NA count for SearchCount in ../data/googletrends_eletric-car_2005-2022.csv : 0 #> NA count for ../data/googletrends_EV_2005-2022.csv : 0 #> NA count for SearchCount in ../data/googletrends_EV_2005-2022.csv : 0 #> NA count for ../data/googletrends_vehicule-electrique_2005-2022.csv : 0 #> NA count for SearchCount in ../data/googletrends_vehicule-electrique_2005-2022.csv : 0 #> NA count for ../data/googletrends_voiture-electrique_2005-2022.csv : 0 #> NA count for SearchCount in ../data/googletrends_voiture-electrique_2005-2022.csv : 0# Merge datasets based on the "Date" columnmerged_data <-reduce(processed_datasets, left_join, by ="Date")# Sum the values for each search term for a given datedf_gtrends <- merged_data %>%rowwise() %>%mutate(Sum_SearchCount =sum(c_across(starts_with("SearchCount"))))# Normalize the Sum_SearchCount values between 1 and 100df_gtrends$SearchRatio <- df_gtrends$Sum_SearchCount /8# Print the resultdf_gtrends <- df_gtrends[,-(2:10)]#display cleaned datareactable( df_gtrends,columns =list(Date =colDef(align ="center", # Center align the date columnminWidth =150 ),SearchRatio =colDef(align ="right", # Right align the SearchRatio columnstyle =function(value) {if (value >45) { color <-"green"# Green for values greater than 50 } elseif (value <5) { color <-"red"# Red for values less than 5 } else { color <-"black"# Default color for other values }list(color = color) } ) ),highlight =TRUE, # Highlight rows on hoverdefaultPageSize =10, # Display 10 rows per pagepaginationType ="numbers", # Use numbers for page navigationsearchable =TRUE, # Make the table searchablesortable =TRUE, # Allow sortingresizable =TRUE# Allow column resizing)
2.2.3 Cleaning of demographic_data
The main issue with this data set was to transform it in a way that makes it useful for our analysis. We first had to remove all rows containing category summaries in the middle of the data.
We then formulated to following hypothesizes:
Male and Female present the same attitude towards electric vehicles
Swiss and Foreign citizen living in Switzerland present the same attitude towards electric vehicles
People who are not allowed to drive do not have any impact on electric vehicles use
We then decided to do the following cleaning:
Only kept the “Total” for the Male-Female category
Only kept the “Total” for the Swiss-Foreign category
Only kept people aged between 18 and 99. (People aged over 99 account for 0.02% of the population)
Then, instead of keeping a raw data set with all ages between 18 and 99, we decided to group them in widely accepted age segments, namely Generation Z, Millennials, Generation X and Baby boomers. This categorization will help us to analyse whether different generations have a different approach to their mobility or not.
Code
# Clean and process the demographic datadf_demographic <- demographic_data %>%filter(Year >=2005, Year <=2022) %>%filter(str_detect(Sex, "total"), str_detect(`Citizenship (category)`, "total")) %>%mutate(Age_clean =gsub(" years", "", Age)) %>%filter(!str_detect(Age_clean, "total")) %>%mutate(Age_num =as.numeric(Age_clean)) %>%filter(!is.na(Age_num), Age_num >=18, Age_num <=98) %>%select(Year, Age = Age_num, `Population on 1 January`) %>%mutate(Generation =cut(Age, breaks =c(17, 26, 42, 58, Inf), labels =c("Generation Z", "Millennials", "Generation X", "Baby Boomers"), include.lowest =TRUE),Year =ymd(paste(Year, "01", "01", sep ="-")) # Converting Year to Date format ) %>%group_by(Year, Generation) %>%summarise(Population =sum(`Population on 1 January`, na.rm =TRUE)) %>%pivot_wider(names_from = Generation, values_from = Population)#> Warning: There was 1 warning in `mutate()`.#> i In argument: `Age_num = as.numeric(Age_clean)`.#> Caused by warning:#> ! NAs introduced by coercionreactable( df_demographic, columns =list(Year =colDef(align ="center", # Center align the year columnminWidth =100 ),`Generation Z`=colDef(align ="right", # Right align the population columnstyle =function(value) { color <-ifelse(value >860000, "green", "black") # Green for population over 1 millionlist(color = color) } ),Millennials =colDef(align ="right",style =function(value) { color <-ifelse(value >1900000, "green", "black")list(color = color) } ),`Generation X`=colDef(align ="right",style =function(value) { color <-ifelse(value >2000000, "green", "black")list(color = color) } ),`Baby Boomers`=colDef(align ="right",style =function(value) { color <-ifelse(value >2300000, "green", "black")list(color = color) } ) ),highlight =TRUE, # Highlight rows on hoverdefaultPageSize =10, # Display 10 rows per pagepaginationType ="numbers", # Use numbers for page navigationsearchable =TRUE, # Make the table searchablesortable =TRUE, # Allow sortingresizable =TRUE# Allow column resizing)
Code
str(df_demographic)#> gropd_df [18 x 5] (S3: grouped_df/tbl_df/tbl/data.frame)#> $ Year : Date[1:18], format: "2005-01-01" ...#> $ Generation Z: int [1:18] 788674 797102 803780 819198 835811 8482..#> $ Millennials : int [1:18] 1783064 1762564 1743133 1743763 1754815..#> $ Generation X: int [1:18] 1693753 1722516 1754075 1790725 1832671..#> $ Baby Boomers: int [1:18] 1764390 1800973 1839602 1876304 1914329..#> - attr(*, "groups")= tibble [18 x 2] (S3: tbl_df/tbl/data.frame)#> ..$ Year : Date[1:18], format: "2005-01-01" ...#> ..$ .rows: list<int> [1:18] #> .. ..$ : int 1#> .. ..$ : int 2#> .. ..$ : int 3#> .. ..$ : int 4#> .. ..$ : int 5#> .. ..$ : int 6#> .. ..$ : int 7#> .. ..$ : int 8#> .. ..$ : int 9#> .. ..$ : int 10#> .. ..$ : int 11#> .. ..$ : int 12#> .. ..$ : int 13#> .. ..$ : int 14#> .. ..$ : int 15#> .. ..$ : int 16#> .. ..$ : int 17#> .. ..$ : int 18#> .. ..@ ptype: int(0) #> ..- attr(*, ".drop")= logi TRUE
2.2.4 Data Wrangling : Swiss vehicle clean
Code
# Define a function to process vehicle data, excluding the first row as it's not actual dataprocess_vehicle_data <-function(vehicle_data, start_year, end_year) {# the first row is a header or metadata and should be excluded vehicle_data <- vehicle_data[-1, ] col_names <-c("Canton", "VehicleGroupType", "Fuel", "Month", paste0("X", start_year:end_year))names(vehicle_data) <- col_names vehicle_data %>%filter(!str_detect(trimws(VehicleGroupType), "^>")) %>%rename(Location = Canton, VehicleType = VehicleGroupType) %>%mutate(VehicleType =str_remove(VehicleType, "^\\.\\.\\.\\s*")) %>%pivot_longer(cols =all_of(paste0("X", start_year:end_year)), names_to ="Year", values_to ="Count") %>%mutate(Year =as.numeric(str_remove(Year, "X")),MonthNum =match(Month, month.name),Date =as.Date(paste(Year, ifelse(is.na(MonthNum), 1, MonthNum), "01", sep ="-"), format ="%Y-%m-%d")) %>%select(-Month, -Year, -MonthNum)}# Process 2005 to 2008 and 2009 to 2022 datav_2005_2008 <-process_vehicle_data(vehicle_data_2005_2008, 2005, 2008)v_2009_2022 <-process_vehicle_data(vehicle_data_2009_2022, 2009, 2022)# Merge and arrange datadf_v <-bind_rows(v_2005_2008, v_2009_2022) %>%arrange(Date)# Canton and Fuel Type Standardizationstandard_names <-c("Switzerland"="Switzerland", "Zürich"="ZH", "Bern"="BE", "Luzern"="LU","Uri"="UR", "Schwyz"="SZ", "Obwalden"="OW", "Nidwalden"="NW","Glarus"="GL", "Zug"="ZG", "Fribourg"="FR", "Solothurn"="SO","Basel-Stadt"="BS", "Basel-Landschaft"="BL", "Schaffhausen"="SH","Appenzell Ausserrhoden"="AR", "Appenzell Innerrhoden"="AI", "St. Gallen"="SG", "Graubünden"="GR", "Aargau"="AG", "Thurgau"="TG", "Ticino"="TI", "Vaud"="VD", "Valais"="VS","Neuchâtel"="NE", "Genève"="GE", "Jura"="JU", "Confederation"="Confederation")df_v <- df_v %>%mutate(Location =iconv(Location, from ="latin1", to ="UTF-8")) %>%mutate(Location =map_chr(str_split(Location, " / "), ~ .x[1])) %>%mutate(Location = standard_names[Location],Fuel =case_when( Fuel %in%c("Petrol-electricity: conventional hybrid", "Diesel-electricity: conventional hybrid") ~"Conventional hybrid", Fuel %in%c("Petrol-electricity: plug-in hybrid", "Diesel-electricity: plug-in hybrid") ~"Plug-in hybrid", Fuel =="Gas (monovalent and bivalent)"~"Gas",TRUE~ Fuel))# Count number of vehicles for a particular year#vehicle_count_2022 <- df_v %>%# filter(Location == "Switzerland", year(Date) == 2023, VehicleType == "Passenger car") %>%# summarize(TotalCount = sum(Count))
Redundancy
The dataset, representing new vehicle registrations in Switzerland from 2005 onwards, exhibited redundancy in its structure. For example, the data contained both main categories (like ‘> Passenger cars’) and their respective sub-categories (‘… Passenger car’ and ‘.. Heavy passenger car’). On examination, it was observed that the counts under the main categories were simply aggregates of the counts of their sub-categories. Such redundancy could lead to double counting in analytical processes.
We addressed this issue by removing main categories that were aggregates of their sub-categories, preventing potential double counting in analyses. For two subsets of data (2005 to 2008 and 2009 to 2022), We filtered out main categories, cleaned and standardized the column names, reshaped the data set from wide to long format to facilitate analysis, and extracted and formatted the year information for each observation. This process allowed for a more granular and consistent data set for further analysis and interpretation.
Merging and Joining
As the vehicle data was split across multiple files, it was necessary to combine them for a holistic view.
We used bind_rowsto create df_v, then sorted the dataset by year to ensure a structured chronological view of the data.
Refinement and Vehicle Classification Insights
Some canton names in the dataset had special characters not properly encoded, displaying as hexadecimal escape sequences. To correct this, we used iconv to properly encode these characters from “latin1” to “UTF-8”. Additionally, where multiple names were provided for cantons (separated by slashes), we retained only the first name. We standardized the canton names using a predefined list (standard_names).
We unified the date representation by combining ‘Year’ and ‘Month’ information, matching month names to their corresponding numeric values, then merged this information with the year, and formatted it as “YYYY-MM-DD” to generate a standard date format. This transformation streamlines time series analyses and provides a more intuitive representation of the data’s temporal dimension.
We simplified the classification of hybrid vehicles by merging different categories of conventional and plug-in hybrids into two main groups: “Conventional hybrid” and “Plug-in hybrid”, respectively. This simplification aimed to streamline analysis by grouping similar types together.
The “Gas” fuel type was renamed from “Gas (monovalent and bivalent)” for clarity and consistency in the dataset.
Further context was provided on different hybrid types:
Conventional Hybrids: These vehicles, powered by petrol or diesel without plug-in capability, might exhibit different adoption trends due to their longer existence in the market.Indeed, they have been around for longer and may have a different adoption trend compared to newer technologies.
Plug-in Hybrids: With larger batteries and the ability to charge from an electric outlet, these vehicles enable all-electric operation for limited distances. The presence of charging infrastructure might impact their adoption differently, distinguishing them from conventional hybrids. Adoption might be reflected differently due to the need for charging infrastructure.
Rows with 0
Retaining rows with a Count of 0 ensures the completeness of our dataset, indicating periods when specific vehicle types or fuel categories had no registrations. This not only provides a holistic view of vehicle adoption trends over time but also aids in generating continuous time series visualizations without gaps, offering a true representation of the data.
This dataset will be used to compare our data in Switzerland with a bordering country (FR)
It is important to note that this data set contains all registered vehicles in a given year (vs. all new registered cars for our Swiss data) We will only keep the vehicles used for personal use (not the ones for professional use), to match our Swiss dataset
We decided to remove some anecdotal observations (i.e Hydrogen, Unknown, etc.) as their production / use is marginal. Moreover, we have decided to regroup the different kind of hybrid motors into two Hybrid categories. The 2 categories are the plugable hybrid vehicle HR (petrol or diesel) and the non-plugable hybrid vehicles HNR (petrol or diesel). This was done to match our Swiss data set.
The main issue with this data set was to set it in a way where it is comparable to our Swiss cars data set. Indeed, this dataset presents the total of all cars in France for a given year and type, while our Swiss data set accounts for new registration of cars for a given year and type.
To be able to compare two data sets, we decided to format the French data set in the following way:
Remove categories that are not relevant for our analysis, and widely under-represented in the population (i.e Gaz, hydrogen)
Focus on the private use of cars (vs. professional), because the decision to use a certain type of cars for professional reasons are often explained by factors hard to capture in our analysis (i.e bundle deals, greenwashing). We therefore hypothesized that private use of a certain type of car better gives insights on population view.
The match our swiss data set (new registration per year), we computed the difference of total cars registered in France per years. Having only the data between 2011 and 2022, we have “lost” the insights for 2011. Indeed, when creating the deltas (difference) columns, 2011 deltas could not be computed (no data for 2010), we therefore decided to delete that year from our data
Code
# Only taking the private cars from this data and transforming the datadf_v_fr <- france_v[25:37, ] %>%t() %>%as_tibble(.name_repair ="minimal") %>%setNames(.[1, ]) %>%slice(-1) %>%mutate(Year =2011:2022) %>%select(-c("Particulier", "Gaz", "Gaz HNR", "Gaz HR", "Hydrogène et autre ZE", "Inconnu")) %>%mutate(across(-Year, ~floor(as.numeric(.)))) %>%mutate(Conventional_Hybrid =as.numeric(`Diesel HNR`) +as.numeric(`Essence HNR`),Plug_in_Hybrid =as.numeric(`Diesel HR`) +as.numeric(`Essence HR`),across(c(Diesel, Essence, Conventional_Hybrid, Plug_in_Hybrid, Electrique), as.numeric),Diesel_delta = Diesel -lag(Diesel),Essence_delta = Essence -lag(Essence),Conventional_Hybrid_delta = Conventional_Hybrid -lag(Conventional_Hybrid),Plug_in_Hybrid_delta = Plug_in_Hybrid -lag(Plug_in_Hybrid),Electrique_delta = Electrique -lag(Electrique) ) %>%filter(!is.na(Diesel_delta)) %>%select(Date = Year, Diesel, Diesel_delta, Essence, Essence_delta, Conventional_Hybrid, Conventional_Hybrid_delta, Plug_in_Hybrid, Plug_in_Hybrid_delta, Electrique, Electrique_delta) %>%mutate(Date =as.Date(paste(Date, "-01-01", sep =""), format ="%Y-%m-%d"))# Display cleaned datareactable(head(df_v_fr, 100), sortable =TRUE, searchable =TRUE)
2.2.6 Availability of charging stations
This data set will be used to explore and analyze the effects of charging station availability on EVs adoption.
However, two important points are to note.
We were unable to find data before November 2020
Charging station availability and EV in market are likely strongly reciprocally correlated.
Moreover, it is important to keep standardized dates and locations
We merges ‘year’ and ‘month’ columns to create a ‘Date’ column in the format “YYYY-MM-01”, subsequently reorganizing the dataframe to position ‘Date’ as the first column and removing the original ‘year’ and ‘month’ columns.
Code
# Combine 'year' and 'month' columns to create a new 'Date' columncharging_station$Date <-as.Date(paste(charging_station$year, charging_station$month, "01", sep ="-"), format ="%Y-%m-%d")# Rearrange columns with 'Date' as the first column and drop 'year' and 'month'df_charging_station <- charging_station %>%select(Date, everything()) %>%select(-year, -month)# Display cleaned datareactable(df_charging_station,sortable =TRUE, searchable =TRUE)
2.2.7 Availability of charging station in France and Switzerland
This data set is used to complement the prior charging station data set.
It has data going back to 2012. And data for both Switzerland and France
Code
# Only selecting necessary columnscharge_ch_fr <- charge_ch_fr %>%select(c("year", "region", "powertrain", "value"))# Setting year as.Datecharge_ch_fr$year <-as.Date(paste(charge_ch_fr$year,"-01-01", sep =""), format ="%Y-%m-%d")df_charge_number_CH <- charge_ch_fr %>%filter(region =="Switzerland")df_charge_number_FR <- charge_ch_fr %>%filter(region =="France")
2.2.8 Political strength per canton
This data set will be used to explore and analyze the effects of political convictions on EVs adoption (per Canton).
The 2 main difficulties with this data set were the following: 1. Create a process (loop) general enough to clean all the Cantons at the same time, but specific enough for it to work 2. The Swiss political parties’ names have evolved quite a bit over the years.
To use this data effectively in our analysis. We will group them in 5 categories.
The 5 categories respond to the following question: “What is the political party’s stance on sustainability / ecological measures?” 1. Against 2. Slightly Against 3. Neutral 4. Slightly in Favour 5. In Favour
We assigned each political parties in one of those categories based on their program.
Code
# We import each different sheets (one per canton) into a data setpolitical_data_sheets <-lapply(setdiff(excel_sheets("../data/political_data.xlsx"), "Contenu"), function(sheet) { p_data <-suppressMessages(read_excel("../data/political_data.xlsx", sheet = sheet))return(p_data) })sheet_names <-setdiff(political_data_sheets_prep, "Contenu")named_data_list <-setNames(political_data_sheets, sheet_names)# Now to the cleaning partfor (i inseq_along(named_data_list)) {# setting one dataset we work on current_dataset <- named_data_list[[i]]# where is "taux de participation" index_to_keep <-which(current_dataset[[1]] =="Taux de participation")[1]# keep only rows until "taux de participation" and delete the first one "force des partis" named_data_list[[i]] <- current_dataset[2:index_to_keep-2, ]# remove columns 2 and 3 (they are all NAs) named_data_list[[i]] <- named_data_list[[i]][ ,-c(2,3)]# transposing the data sets to have years in a single columns transposed_data <-t(named_data_list[[i]]) named_data_list[[i]] <-as_tibble(transposed_data[-1, ])# set column names and change date as.Datecolnames(named_data_list[[i]]) <- transposed_data[1, ] named_data_list[[i]][[2]] <-as.Date(paste(named_data_list[[i]][[2]], "-01-01", sep =""), format ="%Y-%m-%d")# find the columns that are only NAs and remove them named_data_list[[i]] <- named_data_list[[i]][, colSums(!is.na(named_data_list[[i]])) >0, drop =FALSE]# Removing all non-numeric values named_data_list[[i]][, -1] <-apply(named_data_list[[i]][, -1], 2, function(x) as.numeric(gsub("[^0-9.]", "", x)))# Setting the first column name as "Date"colnames(named_data_list[[i]])[1] <-"Date"# Removing the rows before 01.01.1999 threshold_date <-as.Date("1999-01-01") named_data_list[[i]] <- named_data_list[[i]][named_data_list[[i]]$Date >= threshold_date, ]}#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column#> names if `.name_repair` is omitted as of tibble 2.0.0.#> i Using compatibility `.name_repair`.#> Warning in FUN(newX[, i], ...): NAs introduced by coercion# Creating a map for the political parties stance on sustainabilitysustainability_mapping <-c("Against"=c("MCG (MCR)","PBD", "PBD 1", "PBD 2", "UDC"),"Slightly Against"=c("Lega","PDC", "PDC 1", "PDC 2", "PLR", "PLS", "UDF"),"Neutral"=c("Adl", "DS", "PdL", "POCH", "PSL", "PST", "Rép.", "Sol.","Separ.", "Autres"),"Slightly in Favour"=c("Il Centro", "Il Centro 1", "Le Centre 1","Le Centre", "Le Centre 2", "PCS","PPD", "PPD 1"),"In Favour"=c("AVF", "AVF 1", "PEV", "PS", "PSA", "PVL","VERDI", "VERDI 2","VERT-E-S ", "VERT-E-S 2", "VERT-E-S 3"))# Looping through the data sets the change the names of the parties by their stancefor (i inseq_along(named_data_list)){ current_dataset <- named_data_list[[i]]for (party_name innames(current_dataset)[-1]) { stance <-sapply(sustainability_mapping, function(x) party_name %in% x) stance <-names(stance)[which(stance)]if (length(stance) >0) { selected_columns <-intersect(c(party_name, stance), colnames(current_dataset)) current_dataset[[stance]] <-rowSums(current_dataset[selected_columns], na.rm =TRUE) current_dataset[[party_name]] <-NULL } } named_data_list[[i]] <- current_dataset}# Creating my 5 columnscategories <-c("Against", "Slightly Against", "Neutral", "Slightly in Favour", "In Favour")# Aggregating the information of the data sets into these 5 categoriesfor (i inseq_along(named_data_list)) { current_dataset <- named_data_list[[i]]# Extract the Date column result_dataset <- current_dataset[, "Date", drop =FALSE]# Loop through each category and aggregate valuesfor (category in categories) { matching_columns <-grep(paste0("^", category, "\\d*$"), colnames(current_dataset), value =TRUE) result_dataset[[category]] <-rowSums(current_dataset[matching_columns], na.rm =TRUE) } named_data_list[[i]] <- result_dataset}# Storing the data sets into a list:list_politic <-list()for (i inseq_along(named_data_list)) { tibble_name <-paste0("politic_", sheet_names[i]) list_politic[[tibble_name]] <- named_data_list[[i]]}# Correcting a NA in politic_AInew_dates <- list_politic[["politic_VD"]]$Datelist_politic[["politic_AI"]]$Date <- new_dates# Creating a data set per year for Switzerlandpolitical_combined_data <-bind_rows(list_politic, .id ="Canton")political_combined_data$Year <-as.integer(format(political_combined_data$Date, "%Y"))political_combined_data <- political_combined_data[, -which(names(political_combined_data) =="Date")]political_summarized_data <- political_combined_data %>%group_by(Year, Canton) %>%summarize(Against =sum(Against),`Slightly Against`=sum(`Slightly Against`),Neutral =sum(Neutral),`Slightly in Favour`=sum(`Slightly in Favour`),`In Favour`=sum(`In Favour`) )yearly_political_datasets <-list()unique_years <-unique(political_combined_data$Year)for (year in unique_years) { year_political_dataset <- political_combined_data %>%filter(Year == year) yearly_political_datasets[[as.character(year)]] <- year_political_dataset}# now accessible via ' political_yearly_data$Year 'political_combined_data <- political_combined_data %>%mutate(Canton =sub("politic_", "", Canton), # Remove 'politic_' prefixYear =ymd(paste(Year, "01", "01")) # Convert Year to date type )political_combined_data <- political_combined_data %>%mutate(KANTONSNUM =case_when( Canton =="ZH"~1, Canton =="BE"~2, Canton =="LU"~3, Canton =="UR"~4, Canton =="SZ"~5, Canton =="OW"~6, Canton =="NW"~7, Canton =="GL"~8, Canton =="ZG"~9, Canton =="FR"~10, Canton =="SO"~11, Canton =="BS"~12, Canton =="BL"~13, Canton =="SH"~14, Canton =="AR"~15, Canton =="AI"~16, Canton =="SG"~17, Canton =="GR"~18, Canton =="AG"~19, Canton =="TG"~20, Canton =="TI"~21, Canton =="VD"~22, Canton =="VS"~23, Canton =="NE"~24, Canton =="GE"~25, Canton =="JU"~26 ) )# Display cleaned datareactable(political_combined_data,sortable =TRUE, searchable =TRUE)
Code
str(political_combined_data)#> tibble [182 x 8] (S3: tbl_df/tbl/data.frame)#> $ Canton : chr [1:182] "ZH" "ZH" "ZH" "ZH" ...#> $ Against : num [1:182] 32.5 33.4 33.9 35.1 34.3 ...#> $ Slightly Against : num [1:182] 24.9 23.7 22.8 18.8 21.6 ...#> $ Neutral : num [1:182] 7.46 12.36 9.09 2.64 2.8 ...#> $ Slightly in Favour: num [1:182] 0.218 0 0.147 0.157 0 ...#> $ In Favour : num [1:182] 35 39.7 42 43.2 41.3 ...#> $ Year : Date[1:182], format: "1999-01-01" ...#> $ KANTONSNUM : num [1:182] 1 1 1 1 1 1 1 2 2 2 ...
2.2.9 Swiss Population
We chose to use the population data from the year 2022 to ensure the most current and relevant demographic context, providing a contemporary snapshot that aligns closely with the latest trends in electric vehicle registrations.
Assuming the relevant data starts from row 5, canton names are in the first column, and the population figures for 2022 are in the second column.
Code
df_swisspop_2022 <- df_swisspop_2022 %>%slice(-1:-4) %>%# Remove the first 4 rowsselect(Canton =1, TotalPopulation =2) # Select only the canton names and population figures# Remove rows with NAs in the Canton columndf_swisspop_2022 <- df_swisspop_2022 %>%filter(!is.na(Canton))# Map Canton names to abbreviationsdf_swisspop_2022 <- df_swisspop_2022 %>%mutate(CantonAbbreviation =case_when( Canton =="Zurich"~"ZH", Canton =="Bern"~"BE", Canton =="Lucerne"~"LU", Canton =="Uri"~"UR", Canton =="Schwyz"~"SZ", Canton =="Obwalden"~"OW", Canton =="Nidwalden"~"NW", Canton =="Glarus"~"GL", Canton =="Zug"~"ZG", Canton =="Fribourg"~"FR", Canton =="Solothurn"~"SO", Canton =="Basel-Stadt"~"BS", Canton =="Basel-Landschaft"~"BL", Canton =="Schaffhausen"~"SH", Canton =="Appenzell A. Rh."~"AR", Canton =="Appenzell I. Rh."~"AI", Canton =="St. Gallen"~"SG", Canton =="Graubünden"~"GR", Canton =="Aargau"~"AG", Canton =="Thurgau"~"TG", Canton =="Ticino"~"TI", Canton =="Vaud"~"VD", Canton =="Valais"~"VS", Canton =="Neuchâtel"~"NE", Canton =="Geneva"~"GE", Canton =="Jura"~"JU",TRUE~NA_character_# For unrecognized cantons ))# Map Canton names to KANTONSNUMdf_swisspop_2022 <- df_swisspop_2022 %>%mutate(KANTONSNUM =case_when( Canton =="Graubünden"~18, Canton =="Bern"~2, Canton =="Valais"~23, Canton =="Vaud"~22, Canton =="Ticino"~21, Canton =="St. Gallen"~17, Canton =="Zurich"~1, Canton =="Fribourg"~10, Canton =="Lucerne"~3, Canton =="Aargau"~19, Canton =="Uri"~4, Canton =="Thurgau"~20, Canton =="Schwyz"~5, Canton =="Jura"~26, Canton =="Neuchâtel"~24, Canton =="Solothurn"~11, Canton =="Glarus"~8, Canton =="Basel-Landschaft"~13, Canton =="Obwalden"~6, Canton =="Nidwalden"~7, Canton =="Geneva"~25, Canton =="Schaffhausen"~14, Canton =="Appenzell A. Rh."~15, Canton =="Zug"~9, Canton =="Appenzell I. Rh."~16, Canton =="Basel-Stadt"~12,TRUE~NA_integer_# For unrecognized cantons ))df_swisspop_2022$TotalPopulation <-as.integer(df_swisspop_2022$TotalPopulation)# Filter out rows where CantonAbbreviation is NAdf_swisspop_2022 <- df_swisspop_2022 %>%filter(!is.na(CantonAbbreviation))# Display cleaned datareactable(df_swisspop_2022,sortable =TRUE, searchable =TRUE)
3 Exploratory data analysis
3.1 Switzerland
3.1.1 seasonality
Code
#creating dataset for the three seasonality graphspassenger_cars_processed <- df_v %>%filter(VehicleType =="Passenger car") %>%mutate(YearMonth =floor_date(Date, "month")) %>%group_by(YearMonth) %>%summarise(Count =sum(Count, na.rm =TRUE)) %>%ungroup() %>%mutate(Year =year(YearMonth), Month =factor(month(YearMonth), levels =1:12, labels = month.abb)) %>%arrange(Year, Month)# plot1p_seaso_1 <-ggplot(passenger_cars_processed, aes(x = YearMonth, y = Count)) +geom_line(color ="darkblue", size =0.5) +labs(title ="Passenger Car Adoption Over Time in Switzerland",x ="Date",y ="Number of Passenger Cars Registered") +theme_minimal() +scale_x_date(date_breaks ="1 year", date_labels ="%Y") +geom_smooth(method ="loess", se =FALSE, color ="blue") +theme(axis.text.x =element_text(angle =45, hjust =1))#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.#> i Please use `linewidth` instead.# Convert to an interactive plotly objectinteractive_plot_seaso_1 <-ggplotly(p_seaso_1, width =600, height =400)# Display the interactive plotinteractive_plot_seaso_1
The blue line represents a smoothed trend, indicating an initial increase in car registrations until around 2014-2015, followed by a gradual decline. The black line shows the actual number of cars registered, with significant variability. We will call this variation seasonality which is better represented in the following graph.
Code
# Plotting the data with ggplot2, showing the trend within each yearp_seaso_2 <-ggplot(passenger_cars_processed, aes(x = Month, y = Count, group = Year, color =as.factor(Year))) +geom_smooth(se =FALSE, method ="loess", span =0.5, size =0.7) +labs(title ="Monthly Passenger Car Registrations by Year",x ="Month",y ="Number of Passenger Cars Registered",color ="Year") +theme_minimal() +scale_color_viridis_d() +theme(legend.position ="bottom", axis.text.x =element_text(angle =45, hjust =1))# Convert to an interactive plotly objectinteractive_plot_seaso_2 <-ggplotly(p_seaso_2, width =600, height =400)# Adjust plotly settings if needed, such as margins or layoutinteractive_plot_seaso_2 <- interactive_plot_seaso_2 %>%layout(margin =list(l =40, r =10, b =40, t =40), # Adjust marginslegend =list(orientation ="h", x =0, xanchor ="left", y =-0.2)) # Adjust legend position# Display the interactive plotinteractive_plot_seaso_2
This pattern suggests a seasonal trend with a mid-year peak and a year-end increase. 2020 reacts differently than other years. We suggest that it is probably related to Covid policies.
Code
# Plotting the data with ggplot2, showing the trend within each yearp_seaso_3 <-ggplot(passenger_cars_processed, aes(x = Month, y = Count, group = Year, color =as.factor(Year))) +geom_line() +facet_wrap(~ Year, scales ="free_y") +# Facet by year with free y scaleslabs(title ="Seasonal Trends in Passenger Car Registrations",x ="Month",y ="Number of Passenger Cars Registered") +theme_minimal() +scale_color_viridis_d(guide =FALSE) +# Use viridis color scale and remove the guide/legendtheme(axis.text.x =element_text(angle =45, hjust =1)) +# Rotate x-axis labels by 45 degreestheme(axis.text.x =element_blank(), # This will remove the month labelsaxis.text.y =element_blank(), # This will remove the month labelsaxis.ticks.x =element_blank(), # This will remove the ticks on the x-axislegend.position ="none") # Remove the legend to clean up the plot# Convert to an interactive plotly objectinteractive_plot_seaso_3 <-ggplotly(p_seaso_3, width =600, height =400) %>%layout(xaxis =list(tickmode ="array",tickvals =1:12,ticktext = month.abb))# Display the interactive plotinteractive_plot_seaso_3
This version of the graphs helps us visualizing the evolution of registration for each year individually. As mentioned above, 2020 is the only year which show a decreasing peak instead of an increasing one as it is presented for all other years.
3.1.2 Vehicule Registration by Fuel time over time
Code
# Filter df_v for specific fuel types and vehicle typefiltered_df <- df_v %>%filter(Fuel %in%c("Petrol", "Diesel", "Conventional hybrid", "Plug-in hybrid", "Electricity") & VehicleType =="Passenger car")# Group by Date and Fuel type, and summarize the countfuel_type_trends <- filtered_df %>%group_by(Date, Fuel) %>%summarize(Count =sum(Count, na.rm =TRUE), .groups ='drop')# Plotting the trends over time by fuel typep_fuel_over_time <-ggplot(fuel_type_trends, aes(x = Date, y = Count, color = Fuel)) +geom_line(alpha =0.3) +# Plot the actual linesgeom_smooth(aes(group = Fuel), se =FALSE, method ="loess", span =0.1) +# Add smoothed lines per fuel typelabs(title ="Vehicle Registrations by Fuel Type Over Time",x ="Date",y ="Number of Vehicles Registered") +theme_minimal() +theme(legend.position ="bottom") # Adjust legend position to the bottom# Convert to an interactive plotly objectinteractive_plot_fuel_over_time <-ggplotly(p_fuel_over_time, width =600, height =400)# Adjust plotly settings interactive_plot_fuel_over_time <- interactive_plot_fuel_over_time %>%layout(legend =list(orientation ="h", x =0, xanchor ="left", y =-0.2))# Display the interactive plotinteractive_plot_fuel_over_time
The graph above shows the vehicle registrations in Switzerland by fuel type from 2005 to 2023.
On the one hand, it highlights the recent decrease in new registrations of thermic vehicles since 2017. On the other hand, the three EVs named as Conventional and Plug-in hybrids along with electricity ones are gradually increasing since approximately the same period. (In fact, fully electric cars and Conventional hybrids have even reached a higher level of vehicle registration in the past years, spotlighting the EV tendency in recent years.)
3.1.3 Availability of Charging station
3.1.3.1 Availability of Charging station in Switzerland
Code
p_charging_station <-ggplot(df_charge_number_CH, aes(x = year, y = value, group = powertrain, color = powertrain)) +geom_line() +labs(title ="Available charging station in Switzerland over the years",x ="Year",y ="Number of Charging Stations") +theme_minimal()# Convert to an interactive plotly objectinteractive_plot_charging_station <-ggplotly(p_charging_station, width =600, height =400)# Adjust plotly settings interactive_plot_charging_station <- interactive_plot_charging_station %>%layout(legend =list(orientation ="h", x =0, xanchor ="left", y =-0.2))# Display the interactive plotinteractive_plot_charging_station
3.1.3.2 Availability of Charging station in France
Code
p_charging_station_fr <-ggplot(df_charge_number_FR, aes(x = year, y = value, group = powertrain, color = powertrain)) +geom_line() +labs(title ="Available charging station in France over the years",x ="Year",y ="Number of Charging Stations") +theme_minimal()# Convert to an interactive plotly objectinteractive_plot_charging_station_fr <-ggplotly(p_charging_station_fr, width =600, height =400)# Adjust plotly settings interactive_plot_charging_station_fr <- interactive_plot_charging_station_fr %>%layout(legend =list(orientation ="h", x =0, xanchor ="left", y =-0.2))# Display the interactive plotinteractive_plot_charging_station_fr
3.1.4 Map
3.1.4.1 Count of Electricity car Registration for all years per cantons
swissBOUNDARIES3D_1_4_TLM_KANTONSGEBIET.shp contains the boundaries of the cantons of Switzerland.
The choice to display the sum of electric vehicle registrations over all years on the map was made to provide a comprehensive historical perspective, highlighting the total adoption of electric vehicles in each Swiss canton since the beginning of the dataset’s timeframe. This approach effectively captures the cumulative impact of electric vehicle adoption across the country.
The process involved filtering vehicle registration data for electric passenger cars and summing up the total registrations for each Swiss canton. This data was then combined with population figures and geographic boundaries to calculate electric vehicle (EV) registrations per capita, providing a standardized comparison of EV adoption across cantons.
Code
# Read in the shapefile for Swiss cantonsswiss_cantons <-st_read("../data/CH_map/swissBOUNDARIES3D_1_4_TLM_KANTONSGEBIET.shp")#> Reading layer `swissBOUNDARIES3D_1_4_TLM_KANTONSGEBIET' from data source `C:\Users\UrsHu\Pillars\Learn\Academic\Master\Semestre 1\Data Science\0_group_project\dsfba_project\data\CH_map\swissBOUNDARIES3D_1_4_TLM_KANTONSGEBIET.shp' #> using driver `ESRI Shapefile'#> Simple feature collection with 50 features and 20 fields#> Geometry type: POLYGON#> Dimension: XYZ#> Bounding box: xmin: 2490000 ymin: 1080000 xmax: 2830000 ymax: 1300000#> z_range: zmin: 193 zmax: 4610#> Projected CRS: CH1903+ / LV95 + LN02 height# Define canton abbreviations for matchingabbreviation_values <-c("ZH", "BE", "LU", "UR", "SZ", "OW", "NW", "GL", "ZG", "FR", "SO", "BS", "BL", "SH", "AR", "AI", "SG", "GR", "AG", "TG", "TI", "VD", "VS", "NE", "GE", "JU")# Prepare the EV data with sum over all yearsdf_v_map <- df_v %>%filter(!Location %in%c("Switzerland", "Confederation"), Fuel =="Electricity", VehicleType =="Passenger car") %>%mutate(KANTONSNUM =match(Location, abbreviation_values)) %>%group_by(KANTONSNUM) %>%summarize(TotalEV =sum(Count), .groups ='drop')# Merge EV data with population datadf_v_map <-left_join(df_v_map, df_swisspop_2022, by =c("KANTONSNUM"="KANTONSNUM"))str(df_v_map)#> tibble [26 x 5] (S3: tbl_df/tbl/data.frame)#> $ KANTONSNUM : num [1:26] 1 2 3 4 5 6 7 8 9 10 ...#> $ TotalEV : int [1:26] 29541 11652 5365 325 2960 551 690..#> $ Canton : chr [1:26] "Zurich" "Bern" "Lucerne" "Uri" ...#> $ TotalPopulation : int [1:26] 1579967 1051437 424851 37317 1649..#> $ CantonAbbreviation: chr [1:26] "ZH" "BE" "LU" "UR" ...# Calculate EV registrations per capitadf_v_map <- df_v_map %>%mutate(EV_per_Capita = TotalEV / TotalPopulation)# Merge with shapefile datamap_data <-left_join(swiss_cantons, df_v_map, by ="KANTONSNUM")# Ensure the geometries are valid and the CRS is set to WGS 84# Check if 'map_data' is already an sf objectif (!inherits(map_data, "sf")) { map_data_sf <-st_as_sf(map_data, wkt ="geometry")} else { map_data_sf <- map_data}# Ensure the geometries are valid and the CRS is setmap_data_sf <-st_make_valid(map_data_sf)# Reproject the data to WGS 84 (EPSG:4326)map_data_sf <-st_transform(map_data_sf, crs =4326)# Create color palettes for the 'Total' and 'EV_per_Capita' columnscolor_palette_total <-colorNumeric(palette ="viridis", domain = map_data_sf$TotalEV)color_palette_per_capita <-colorNumeric(palette ="viridis", domain = map_data_sf$EV_per_Capita)# Create the leaflet mapsleaflet_map_total <-leaflet(map_data_sf) %>%addProviderTiles(providers$CartoDB.Positron) %>%addPolygons(fillColor =~color_palette_total(TotalEV),weight =1,color ="#FFFFFF",fillOpacity =0.7,popup =~paste(NAME, "<br>Total EV Registrations: ", TotalEV) ) %>%addLegend(pal = color_palette_total, values =~TotalEV, opacity =0.7, title ="Total EV <br> Registrations",position ="topright" )leaflet_map_per_capita <-leaflet(map_data_sf) %>%addProviderTiles(providers$CartoDB.Positron) %>%addPolygons(fillColor =~color_palette_per_capita(EV_per_Capita),weight =1,color ="#FFFFFF",fillOpacity =0.7,popup =~paste(NAME, "<br>EV Registrations per Capita: ", round(EV_per_Capita, 3)) ) %>%addLegend(pal = color_palette_per_capita, values =~EV_per_Capita, opacity =0.7, title ="EV Registrations <br> per Capita",position ="topright" )# Print the maps to view themleaflet_map_total
3.1.4.2 Count of Electricity car Registration for all years per cantons Standardized
For example, Zurich (ZH) has a relatively lower EV_per_Capita value (0.01870) despite a high total number of EV registrations (29,541), due to its large population (1,579,967). In contrast, Zug (ZG) shows a higher EV_per_Capita (0.04666) with fewer EV registrations (6,120) but a much smaller population (131,164), indicating a greater adoption rate when adjusted for population size.
The limitation of this approach is that it considers the total population, not accounting for the segment of the population that is of driving age or interested in vehicle ownership, which could further refine the EV adoption rates.
Code
leaflet_map_per_capita
3.2 Google Trend
Code
p_gtrends <-ggplot(df_gtrends, aes(x = Date, y = SearchRatio)) +geom_line(color ="darkgreen", size =1, stat='smooth', se =FALSE, method ="loess", span =0.1, size =1) +labs(x ="Date", y ="Google Search", title ="Google search About EV in Switzerland")#> Warning: Duplicated aesthetics after name standardisation: size# Convert to an interactive plotly objectinteractive_plot_gtrends <-ggplotly(p_gtrends, width =600, height =400)# Adjust plotly settings interactive_plot_gtrends <- interactive_plot_gtrends %>%layout(legend =list(orientation ="h", x =0, xanchor ="left", y =-0.2))interactive_plot_gtrends
Another interesting point to look out is the Google search about EV engine in Switzerland. As we can see, it seems to have skyrocketed since 2016. This allows us to get an overview of the Swiss population interest concerning this topic and comfort us towards our initial predictions.
3.3 Oil
The chart below represents the oil price evolution through the last two decades. We can observe a quite high degree of volatility concerning the oil valuation. Nevertheless, it seems important to highlight that this value has greatly increased since the past 3 years.
The fig.show='animate' option tells Quarto to render the plot as an animation.
Code
## basic plot#ggplot(df_oil, aes(x = Date , y = Price)) +# geom_line(color = "darkred", size = 1) +# labs(x = "Date", y = "Price", title = "Oil Price Over Time")# Create a ggplot object with your datap <-ggplot(df_oil, aes(x = Date, y = Price, group =1)) +geom_line(color ="darkred", size =1) +labs(x ="Date", y ="Price", title ="Oil Price Over Time")# Animate the plot with gganimate, revealing the line over timeanimated_plot <- p +transition_reveal(Date)# Render the animationanimate(animated_plot, renderer = gganimate::gifski_renderer(), width =600, height =400, res =96)
3.4 Demographics
Code
demographic_data_long <- df_demographic %>%pivot_longer(cols =c('Generation Z', 'Millennials', 'Generation X', 'Baby Boomers'),names_to ="Generation",values_to ="Population" )# Plotting the data with ggplot2p_demog <-ggplot(demographic_data_long, aes(x = Year, y = Population, color = Generation)) +geom_line(size =1) +labs(title ="Demographic Trends in Switzerland",x ="Year",y ="Population") +theme_minimal() +theme(legend.position ="bottom") +scale_color_manual(values =c("Generation Z"="blue", "Millennials"="red", "Generation X"="green", "Baby Boomers"="purple"))# Convert the ggplot object to an interactive plotly objectinteractive_plot_demog <-ggplotly(p_demog, width =600, height =400)# Adjust plotly settings interactive_plot_demog <- interactive_plot_demog%>%layout(legend =list(orientation ="h", x =0, xanchor ="left", y =-0.2))# The interactive plot can be displayed in an R Markdown document or a Shiny appinteractive_plot_demog
This graph concentrates on the demographic trends’ evolution for each segment of age above mentioned throughout the time. It points out the important overall rise of individuals living in Switzerland over the past two decades.
3.5 French vehicles
3.5.1 Total vehicles evolution France
Code
# Reshape data to long formatlong_registration_data <- df_v_fr %>%select(Date, Diesel, Essence, Conventional_Hybrid, Plug_in_Hybrid, Electrique) %>%gather(key ="Fuel_Type", value ="Count", -Date)# Create ggplotp <-ggplot(long_registration_data, aes(x = Date, y =log(Count), color = Fuel_Type)) +geom_line() +scale_color_manual(values =c("Diesel"="red", "Essence"="blue", "Conventional_Hybrid"="green", "Plug_in_Hybrid"="purple", "Electrique"="orange")) +labs(x ="Date", y ="Log-Scale Count", color ="Fuel Type") +theme_minimal()# Convert to interactive plot and adjust legendggplotly(p, width =600, height =400) %>%layout(legend =list(orientation ='h', x =0.5, xanchor ='center', y =-0.15))
3.5.2 Deltas evolution
Code
# Reshape data to long formatlong_df_v_fr <- df_v_fr %>%select(Date, Diesel_delta, Essence_delta, Conventional_Hybrid_delta, Plug_in_Hybrid_delta, Electrique_delta) %>%gather(key ="Fuel_Type", value ="Delta", -Date)# Map Fuel_Type values to desired nameslong_df_v_fr$Fuel_Type <-recode(long_df_v_fr$Fuel_Type,"Diesel_delta"="Diesel","Essence_delta"="Petrol","Conventional_Hybrid_delta"="Conventional hybrid","Plug_in_Hybrid_delta"="Plug-in hybrid","Electrique_delta"="Electricity")# Create ggplotp <-ggplot(long_df_v_fr, aes(x = Date, y = Delta, color = Fuel_Type)) +geom_line(size =1) +labs(title ="Evolution of cars registered in France over the years by fuel type (Deltas)",x ="Years",y ="Value",color ="Fuel Category") +theme_minimal() +scale_color_manual(values =c("Diesel"="red", "Petrol"="blue", "Conventional hybrid"="green", "Plug-in hybrid"="purple","Electricity"="orange"))# Convert to interactive plot and adjust legendggplotly(p, width =600, height =400) %>%layout(legend =list(orientation ='h', x =0.5, xanchor ='center', y =-0.3))
3.6 Political Parties
Here, we have a cluster plot of the Swiss Cantons according to their political stance on sustainability in 1999 and in 2023
Code
# Let's start with 1999political_data_1999 <- political_combined_data[,-8] %>%filter(Year ==as.Date("1999-01-01")) %>%select(c("Canton", "Against", "Slightly Against", "Neutral", "Slightly in Favour", "In Favour"))# We will use the K-Means method# We start by looking for the right amount of clustersfviz_nbclust(political_data_1999[,-1], kmeans, method ="wss") +geom_vline(xintercept =7, linetype =2)# We can see that 7 clusters seems to be the choice# Changing my tibblepol_cantons_1999 <-as.data.frame(political_data_1999)rownames(pol_cantons_1999) <- pol_cantons_1999$Cantonpol_cantons_1999 <- pol_cantons_1999[,-1]# Fit k-means with 7 clusterskm.res <-kmeans(pol_cantons_1999, 5, nstart =26)fviz_cluster(km.res, data = pol_cantons_1999) +ggtitle("Cluster Analysis of Cantons' stance on sustainability 1999")# Then the PCA:pca_cantons <-prcomp(pol_cantons_1999)pca_plot <-fviz_pca(pca_cantons,col.ind =as.factor(km.res$cluster),palette ="jco",label ="all",repel =TRUE) +ggtitle("PCA Plot for each Canton")# Adjust zoom level by setting xlim and ylimpca_plot +coord_cartesian(xlim =c(-100, 100), ylim =c(-100, 100))#Now the PCA for the variablespca_variables <-prcomp(t(pol_cantons_1999))# Plot variablesfviz_pca_var(pca_variables, col.var ="contrib", gradient.cols =c("#00AFBB", "#E7B800", "#FC4E07"),repel =TRUE) +ggtitle("PCA Plot for Variables")
Code
# Now for 2023political_data_2023 <- political_combined_data[,-8] %>%filter(Year ==as.Date("2023-01-01")) %>%select(c("Canton", "Against", "Slightly Against", "Neutral", "Slightly in Favour", "In Favour"))# Changing my tibblepol_canton_2023 <-as.data.frame(political_data_2023)row.names(pol_canton_2023) <- pol_canton_2023$Cantonpol_canton_2023 <- pol_canton_2023[,-1]# How many clusters?fviz_nbclust(pol_canton_2023, kmeans, method ="wss") +geom_vline(xintercept =6, linetype =2)# We can see that 6 clusters seems to be the choice# Fit k-means with 6 clusterskm.res_2023 <-kmeans(pol_canton_2023, 6, nstart =26)fviz_cluster(km.res_2023, data = pol_canton_2023) +ggtitle("Cluster Analysis of Cantons' stance on sustainability 2023")#Then the PCA plotpca_cantons <-prcomp(pol_canton_2023)pca_plot <-fviz_pca(pca_cantons,col.ind =as.factor(km.res$cluster),palette ="jco",label ="all",repel =TRUE) +ggtitle("PCA Plot for each Canton")# Adjust zoom level by setting xlim and ylimpca_plot +coord_cartesian(xlim =c(-100, 100), ylim =c(-100, 100))#Now the PCA for the variablespca_variables <-prcomp(t(pol_canton_2023))# Plot variablesfviz_pca_var(pca_variables, col.var ="contrib", gradient.cols =c("#00AFBB", "#E7B800", "#FC4E07"),repel =TRUE) +ggtitle("PCA Plot for Variables")
3.7 Swiss vs France
3.7.1 Electric vs Hybrid vs Petrol
Here we compare and visualize specific vehicle fuel types between Swiss and French datasets.
The thicker line are the for Switzerland, the others are for France. We standardized the counts in both the Swiss and French datas ets for comparison purposes. Please juggle with the interactive plot to make it more readable.
Code
# Filtering Swiss data for specific fuel typesswiss_specific_fuel <- df_v %>%filter(Fuel %in%c("Diesel", "Electricity", "Conventional hybrid", "Plug-in hybrid", "Petrol")) %>%filter(Location =='Switzerland') |>filter(VehicleType =='Passenger car') |>filter(Date >as.Date('2012-01-01')) |>filter(Date <as.Date('2021-12-31'))# Selecting equivalent columns from the French datasetfrench_specific_fuel <- df_v_fr %>%select(Date, Diesel_delta, Essence_delta, Conventional_Hybrid_delta, Plug_in_Hybrid_delta, Electrique_delta) # Adjust column names accordingly# Reshape French dataset to long format for easier plottingfrench_specific_fuel_long <- french_specific_fuel %>%pivot_longer(cols =-Date, names_to ="Fuel", values_to ="Count")# Standardize counts in each datasetswiss_specific_fuel <- swiss_specific_fuel %>%mutate(Count =scale(Count))french_specific_fuel_long <- french_specific_fuel_long %>%mutate(Count =scale(Count))# Rename the 'Fuel' column in the French datasetfrench_specific_fuel_long <- french_specific_fuel_long %>%mutate(Fuel =case_when( Fuel =="Diesel_delta"~"Diesel", Fuel =="Essence_delta"~"Petrol", Fuel =="Conventional_Hybrid_delta"~"Conventional hybrid", Fuel =="Plug_in_Hybrid_delta"~"Plug-in hybrid", Fuel =="Electrique_delta"~"Electricity" ))# Define color palette for each fuel typefuel_colors <-c("Diesel"="black", "Electricity"="green", "Conventional hybrid"="purple", "Plug-in hybrid"="blue", "Petrol"="orange")p <-ggplot() +geom_smooth(data = swiss_specific_fuel, aes(x = Date, y = Count, color = Fuel), method ="loess", se =FALSE, size =1.5) +geom_line(data = french_specific_fuel_long, aes(x = Date, y = Count, color = Fuel), alpha =0.4, size =0.8) +scale_color_manual(values = fuel_colors, labels =c("Diesel", "Electricity", "Conventional hybrid", "Plug-in hybrid", "Petrol"),breaks =c("Diesel", "Electricity", "Conventional hybrid", "Plug-in hybrid", "Petrol")) +labs(x ="Date", y ="Standardized Count", color ="Fuel Type") +theme_minimal() +geom_text(data =data.frame(x =as.Date("2021-01-01"), y =c(3, 2.8), label =c("Switzerland has", "the thickest line")), aes(x = x, y = y, label = label, color = label), size =4, show.legend =FALSE)interactive_plot <-ggplotly(p, width =600, height =400, tooltip =c("x", "y", "color"))interactive_plot <- interactive_plot %>%layout(legend =list(orientation ="h", x =0, xanchor ="left", y =-0.2))interactive_plot
Code
p <-ggplot() +geom_smooth(data = swiss_specific_fuel, aes(x = Date, y = Count, color = Fuel), method ="loess", se =FALSE, size =1.5) +geom_line(data = french_specific_fuel_long, aes(x = Date, y = Count, color = Fuel), alpha =0.4, size =0.8) +scale_color_manual(values = fuel_colors, labels =c("Switzerland", "France")) +labs(x ="Date", y ="Standardized Count") +theme_minimal() +theme(legend.position ="bottom",strip.background =element_blank(),strip.text.x =element_text(size =10, angle =0),axis.text.x =element_blank(),axis.text.y =element_blank(),axis.title.x =element_text(size =12, margin =margin(t =10)),axis.title.y =element_text(size =12, margin =margin(r =10))) +facet_wrap(~Fuel, scales ='free_y', ncol =1)# Convert to interactive plotinteractive_plot <-ggplotly(p, width =600, height =600, tooltip =c("x", "y", "color"))interactive_plot <- interactive_plot %>%layout(legend =list(orientation ="h", x =0, xanchor ="left", y =-0.2))interactive_plot
Code
swiss_specific_fuel#> # A tibble: 833 x 5#> Location VehicleType Fuel Count[,1] Date #> <chr> <chr> <chr> <dbl> <date> #> 1 Switzerland Passenger car Petrol 2.06 2012-02-01#> 2 Switzerland Passenger car Diesel 1.05 2012-02-01#> 3 Switzerland Passenger car Conventional hybrid -0.602 2012-02-01#> 4 Switzerland Passenger car Plug-in hybrid -0.652 2012-02-01#> 5 Switzerland Passenger car Conventional hybrid -0.653 2012-02-01#> 6 Switzerland Passenger car Plug-in hybrid -0.655 2012-02-01#> 7 Switzerland Passenger car Electricity -0.651 2012-02-01#> 8 Switzerland Passenger car Petrol 3.27 2012-03-01#> 9 Switzerland Passenger car Diesel 1.51 2012-03-01#> 10 Switzerland Passenger car Conventional hybrid -0.558 2012-03-01#> # i 823 more rowsfrench_specific_fuel_long#> # A tibble: 55 x 3#> Date Fuel Count[,1]#> <date> <chr> <dbl>#> 1 2012-01-01 Diesel 2.92 #> 2 2012-01-01 Petrol -1.17 #> 3 2012-01-01 Conventional hybrid -0.220#> 4 2012-01-01 Plug-in hybrid -0.194#> 5 2012-01-01 Electricity -0.222#> 6 2013-01-01 Diesel 2.06 #> 7 2013-01-01 Petrol -1.58 #> 8 2013-01-01 Conventional hybrid -0.164#> 9 2013-01-01 Plug-in hybrid -0.207#> 10 2013-01-01 Electricity -0.213#> # i 45 more rows
3.8 EV and Google Trends
All in all, the Google trend seems to be in adequation with the actual rise in EVs around Switzerland.
Code
df_v_electric <- df_v |>filter(VehicleType =="Passenger car") |>filter(Fuel =='Electricity')df_electric_vehicles_agg <- df_v_electric %>%group_by(Date) %>%summarize(Count =sum(Count))# Resample both datasets to a monthly frequency using the first day of the month# Merge datasetsmerged_df <-merge(df_electric_vehicles_agg, df_gtrends, by ="Date")# Calculate the ratio for the secondary axismax_values <- merged_df |>summarize(max_count =max(Count, na.rm =TRUE), max_search =max(SearchRatio, na.rm =TRUE))ratio <- max_values$max_count / max_values$max_search# Add the ratio-adjusted SearchRatio to the merged datasetmerged_df <- merged_df |>mutate(AdjustedSearchRatio = SearchRatio * ratio)# Plotting with smoothing and color changesp <-ggplot(merged_df, aes(x = Date)) +geom_bar(aes(y = AdjustedSearchRatio, fill ="Google Trends"), stat ="identity") +geom_smooth(aes(y = Count, color ="Electric Vehicles Smoothed"), method ="loess", span =0.2) +scale_y_continuous("Number of Electric Vehicles",sec.axis =sec_axis(~ . / ratio, name ="Google Trends") ) +labs(title ="Comparison of Electric Vehicle Rise and Google Trends Over Time",x ="Date", fill ="Legend", color ="Legend") +scale_fill_manual(values =c("Google Trends"="darkred")) +scale_color_manual(values =c("Electric Vehicles Smoothed"="blue")) +theme_minimal() +theme(legend.position ="bottom")# Convert to interactive plotinteractive_plot <-ggplotly(p, width =600, height =400) %>%layout(legend =list(orientation ="h", x =0.5, xanchor ="center", y =-0.3))interactive_plot
3.9 EV and Oil Price
The results of the graphic comparing oil price and EVs evolution comforts us in our choice of oil price as an explicative variable. Indeed, the tendency is quite similar since 2020. In fact, the previous years refer to a period where EVs were not as commercialized as today. Obviously, we are aware of the numerous other variables explaining both oil price and EVs rise over time.
Code
# Resample data to monthly frequency and calculate mean oil pricedf_oil_monthly <- df_oil %>%mutate(Date =as.Date(format(Date, "%Y-%m-01"))) %>%group_by(Date) %>%summarize(Price =mean(Price), .groups ='drop')# Resample electric vehicles data to monthly frequency and sum countsdf_electric_vehicles_monthly <- df_electric_vehicles_agg %>%mutate(Date =as.Date(format(Date, "%Y-%m-01"))) %>%group_by(Date) %>%summarize(Count =sum(Count), .groups ='drop')# Merge datasetsdf_merged <-full_join(df_electric_vehicles_monthly, df_oil_monthly, by ="Date")# Calculate the ratio for the secondary axismax_values <- df_merged |>summarize(max_count =max(Count, na.rm =TRUE), max_price =max(Price, na.rm =TRUE))ratio <- max_values$max_count / max_values$max_price# Add the ratio-adjusted Price to the merged datasetdf_merged <- df_merged |>mutate(AdjustedPrice = Price * ratio)# Plotting with smoothing and color changesp <-ggplot(df_merged, aes(x = Date)) +geom_smooth(aes(y = Count, color ="Electric Vehicles Smoothed"), method ="loess", span =0.2) +geom_line(aes(y = AdjustedPrice, color ="Oil Price")) +scale_y_continuous("Number of Electric Vehicles",sec.axis =sec_axis(~ . / ratio, name ="Oil Price") ) +labs(title ="Comparison of Electric Vehicle Rise and Oil Prices Over Time",x ="Date", color ="Legend") +scale_color_manual(values =c("Electric Vehicles Smoothed"="blue", "Oil Price"="darkred")) +theme_minimal() +theme(legend.position ="bottom")# Convert to interactive plotinteractive_plot <-ggplotly(p, width =600, height =400) %>%layout(legend =list(orientation ="h", x =0.5, xanchor ="center", y =-0.3))interactive_plot
4 Analysis
TO DO’s
Answers to the research questions
Different methods considered
Competing approaches
Justifications
4.2 RQ1
Based on past electric vehicle adoption trends in Switzerland, can we forecast future adoption rates and pinpoint times of significant increases or decreases correlated with major events or policy changes?
Simple Linear Regression
With Oil
Choosing Dependent Variable, sales or EV registrations
Sales Data (from df_sales_EV):
Pros: Directly reflects market demand and consumer purchasing behavior. Cons: Can be influenced by short-term factors such as promotions or subsidies, which might not indicate long-term adoption trends.
Registration Data (from df_v_electric):
Pros: Represents actual additions to the vehicle population and can be more indicative of long-term trends in EV adoption. Cons: Might lag behind sales data, as registration occurs post-purchase and can be influenced by administrative processes.
interested in long-term trends in EV adoption and usage, registration data could be more appropriate.
The decision to remove rows with missing values was justified as the number of missing values in the Price column was relatively small (28 out of a larger dataset) and couldn’t be reliably imputed, ensuring that the analysis was performed on a more complete and consistent dataset.
Code
# Select the columns you want to keep in df_v_electricdf_v_electric <- df_v_electric %>%select(Location, Count, Date)# Merge df_oil_monthly with df_v_electric based on the common 'Date' columndf_merged <-merge(df_v_electric, df_oil_monthly, by ="Date", all.x =TRUE)# Remove rows with missing values in the 'Price' columndf_merged <- df_merged[complete.cases(df_merged), ]# Splitting the data into training and testing setsset.seed(123) # for reproducibilitysplit_index <-sample(1:nrow(df_merged), 0.8*nrow(df_merged))train_data <- df_merged[split_index, ]test_data <- df_merged[-split_index, ]# Fitting the linear modelmod1_lin <-lm(Count ~ Price, data = train_data)# Summary of the model to view coefficients and statisticssummary(mod1_lin)#> #> Call:#> lm(formula = Count ~ Price, data = train_data)#> #> Residuals:#> Min 1Q Median 3Q Max #> -40 -40 -39 -27 5514 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 40.42528 10.10153 4.00 6.4e-05 ***#> Price -0.00542 0.12620 -0.04 0.97 #> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> Residual standard error: 222 on 4814 degrees of freedom#> Multiple R-squared: 3.83e-07, Adjusted R-squared: -0.000207 #> F-statistic: 0.00184 on 1 and 4814 DF, p-value: 0.966# Predicting with the test datapredictions <-predict(mod1_lin, test_data)# Calculating residuals (difference between actual and predicted values)residuals <- test_data$Count - predictions# Calculating RMSErmse <-sqrt(mean(residuals^2))print(paste("RMSE:", rmse))#> [1] "RMSE: 247.396521283555"# Creating a summary table for the linear modeltable_regression <-tbl_regression(mod1_lin)# Viewing the tabletable_regression
Characteristic
Beta
95% CI1
p-value
Price
-0.01
-0.25, 0.24
>0.9
1 CI = Confidence Interval
The linear regression analysis results indicate that the Price variable has a very weak or negligible effect on explaining the variation in the Count variable. Specifically:
The coefficient for Price is very close to zero (-0.00542), indicating that there is almost no linear relationship between Price and Count.
The p-value for Price (0.97) is much greater than the common significance level of 0.05. This high p-value suggests that Price is not statistically significant in predicting Count, as it fails to reject the null hypothesis that the coefficient is zero.
The R-squared value is very low (0.000523), indicating that the linear regression model explains only a negligible fraction of the variance in Count.
The Root Mean Squared Error (RMSE) is relatively high (247), suggesting that the model’s predictions have a substantial amount of error.
In summary, based on these results, it appears that the Price variable does not have a meaningful impact on predicting the Count variable, and the linear regression model is not suitable for explaining the relationship between these two variables. Further exploration of the data and potentially considering other factors or modeling approaches may be necessary to improve predictive accuracy.
with google trends
Code
# Merge the two datasets based on the common Date columndf_merged <-merge(df_merged, df_gtrends, by ="Date")# Check for missing values in df_merged#sum(is.na(df_merged))# Check for missing values in each column of df_merged#colSums(is.na(df_merged))# Splitting the data into training and testing setsset.seed(123) # for reproducibilitysplit_index <-sample(1:nrow(df_merged), 0.8*nrow(df_merged))train_data <- df_merged[split_index, ]test_data <- df_merged[-split_index, ]# Fitting the linear modelmod2_lin <-lm(Count ~ SearchRatio, data = train_data)# Summary of the model to view coefficients and statisticssummary(mod2_lin)#> #> Call:#> lm(formula = Count ~ SearchRatio, data = train_data)#> #> Residuals:#> Min 1Q Median 3Q Max #> -252 -39 -13 6 5425 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -82.043 7.139 -11.5 <2e-16 ***#> SearchRatio 5.957 0.314 19.0 <2e-16 ***#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> Residual standard error: 214 on 4814 degrees of freedom#> Multiple R-squared: 0.0695, Adjusted R-squared: 0.0693 #> F-statistic: 359 on 1 and 4814 DF, p-value: <2e-16# Predicting with the test datapredictions <-predict(mod2_lin, test_data)# Calculating residuals (difference between actual and predicted values)residuals <- test_data$Count - predictions# Calculating RMSErmse <-sqrt(mean(residuals^2))print(paste("RMSE:", rmse))#> [1] "RMSE: 240.599640776535"# Creating a summary table for the linear modeltable_regression2 <-tbl_regression(mod2_lin)# Viewing the tabletable_regression2
Characteristic
Beta
95% CI1
p-value
SearchRatio
6.0
5.3, 6.6
<0.001
1 CI = Confidence Interval
Overall, while there is a statistically significant relationship between SearchRatio and Count, the model’s R-squared value suggests that other factors not included in the model may also influence the Count variable. You may want to explore additional variables or more complex models to improve predictive accuracy.
Multivariable Regression
with Demographic groups, Oil Price and Google Trend
Code
# Merge the data frames on the 'Year' columndf_merged <-merge(df_merged, df_demographic, by.x ="Date", by.y ="Year")# Splitting the data into training and testing setsset.seed(123) # for reproducibilitysplit_index <-sample(1:nrow(df_merged), 0.8*nrow(df_merged))train_data <- df_merged[split_index, ]test_data <- df_merged[-split_index, ]# Fitting the multivariable linear modelmod1_multi <-lm(Count ~ SearchRatio + Price +`Generation Z`+ Millennials +`Generation X`+`Baby Boomers`, data = df_merged)# Summary of the model to view coefficients and statisticssummary(mod1_multi)#> #> Call:#> lm(formula = Count ~ SearchRatio + Price + `Generation Z` + Millennials + #> `Generation X` + `Baby Boomers`, data = df_merged)#> #> Residuals:#> Min 1Q Median 3Q Max #> -132.8 -13.9 -4.8 3.7 2068.2 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 1.32e+02 1.06e+03 0.12 0.901 #> SearchRatio 1.04e+00 1.33e+00 0.78 0.437 #> Price 4.10e-02 2.83e-01 0.14 0.885 #> `Generation Z` 1.38e-03 1.80e-03 0.77 0.442 #> Millennials -3.64e-04 4.96e-04 -0.73 0.464 #> `Generation X` -1.02e-03 7.95e-04 -1.28 0.201 #> `Baby Boomers` 6.32e-04 2.66e-04 2.38 0.018 *#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> Residual standard error: 117 on 497 degrees of freedom#> Multiple R-squared: 0.0924, Adjusted R-squared: 0.0815 #> F-statistic: 8.43 on 6 and 497 DF, p-value: 9.9e-09# Predicting with the test datapredictions <-predict(mod1_multi, test_data)# Evaluating the model# Calculating R-squared valuer_squared <-cor(test_data$Count, predictions)^2cat("R-squared: ", r_squared, "\n")#> R-squared: 0.365# Calculating RMSE (Root Mean Squared Error)rmse <-sqrt(mean((predictions - test_data$Count)^2))cat("RMSE: ", rmse, "\n")#> RMSE: 35.6# Create a summary table for the linear modeltable_regression_mod1_multi <-tbl_regression(mod1_multi)# Print the tabletable_regression_mod1_multi
Characteristic
Beta
95% CI1
p-value
SearchRatio
1.0
-1.6, 3.7
0.4
Price
0.04
-0.52, 0.60
0.9
Generation Z
0.00
0.00, 0.00
0.4
Millennials
0.00
0.00, 0.00
0.5
Generation X
0.00
0.00, 0.00
0.2
Baby Boomers
0.00
0.00, 0.00
0.018
1 CI = Confidence Interval
This multivariate linear regression model does not appear to have a strong explanatory power, as indicated by the low adjusted R-squared value and the relatively high RMSE. Additionally, only the Baby Boomers variable shows statistical significance in predicting the Count, while other variables do not appear to be significant in this context. You may want to consider further refining the model or exploring other variables to improve its predictive performance.
Adding Political Parties
Code
# Remove 'Confederation' from df_mergeddf_multi_reg <- df_merged[df_merged$Location !='Confederation', ]# Step 2: Aggregate data for 'Switzerland' in politic data, For each year, sum the values of the 26 cantons to create a combined observation labeled "Switzerland". and then divide by 26political_combined_data$Year <-year(ymd(political_combined_data$Year))df_politics <- political_combined_data %>%group_by(Year) %>%summarise(across(c(`Against`, `Slightly Against`, `Neutral`, `Slightly in Favour`, `In Favour`), sum, na.rm =TRUE)) %>%mutate(Canton ='Switzerland')#> Warning: There was 1 warning in `summarise()`.#> i In argument: `across(...)`.#> i In group 1: `Year = 1999`.#> Caused by warning:#> ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.#> Supply arguments directly to `.fns` through an anonymous function#> instead.#> #> # Previously#> across(a:b, mean, na.rm = TRUE)#> #> # Now#> across(a:b, \(x) mean(x, na.rm = TRUE))# Append the new 'Switzerland' rows to df_politicsdf_politics <-bind_rows(political_combined_data, df_politics)# List of column names you want to divide by 26columns_to_divide <-c("Against", "Slightly Against", "Neutral", "Slightly in Favour", "In Favour")df_politics <- df_politics %>%mutate_at(columns_to_divide, list(~./26))# Step 3: Align 'Date' in df_multi_reg to 'Year' in df_politics df_multi_reg$Year <-year(ymd(df_multi_reg$Date))df_multi_reg <-select(df_multi_reg, -Date)#Merge the datasets on the aligned 'Location/Canton' and 'Date/Year' columns.df_politics$Location <- df_politics$Canton # Step 4: Merge the datasets on 'Canton' and 'Year'df_multi_reg <-merge(df_multi_reg, df_politics, by =c("Location", "Year"), all =FALSE)# Reorder columns in a data framedf_multi_reg <- df_multi_reg %>%select(Count, everything())df_multi_reg <- df_multi_reg %>%select(-Canton, -KANTONSNUM)any(is.na(df_multi_reg))#> [1] FALSE# Perform regression analysis without 'Location'full_model <-lm(Count ~ Year + Price + SearchRatio +`Generation Z`+ Millennials +`Generation X`+`Baby Boomers`+ Against +`Slightly Against`+ Neutral +`Slightly in Favour`+`In Favour`, data = df_multi_reg)# Perform backward eliminationreduced_model <-step(full_model, direction ="backward")#> Start: AIC=715#> Count ~ Year + Price + SearchRatio + `Generation Z` + Millennials + #> `Generation X` + `Baby Boomers` + Against + `Slightly Against` + #> Neutral + `Slightly in Favour` + `In Favour`#> #> #> Step: AIC=715#> Count ~ Year + Price + SearchRatio + `Generation Z` + Millennials + #> `Generation X` + Against + `Slightly Against` + Neutral + #> `Slightly in Favour` + `In Favour`#> #> #> Step: AIC=715#> Count ~ Year + Price + SearchRatio + `Generation Z` + Millennials + #> Against + `Slightly Against` + Neutral + `Slightly in Favour` + #> `In Favour`#> #> #> Step: AIC=715#> Count ~ Year + Price + SearchRatio + `Generation Z` + Against + #> `Slightly Against` + Neutral + `Slightly in Favour` + `In Favour`#> #> #> Step: AIC=715#> Count ~ Year + Price + SearchRatio + Against + `Slightly Against` + #> Neutral + `Slightly in Favour` + `In Favour`#> #> Df Sum of Sq RSS AIC#> - Against 1 101 68999 714#> - `Slightly in Favour` 1 112 69010 714#> - Year 1 537 69435 714#> <none> 68898 715#> - SearchRatio 1 1555 70453 716#> - Price 1 1668 70566 716#> - Neutral 1 6034 74932 723#> - `Slightly Against` 1 17063 85961 737#> - `In Favour` 1 22929 91828 745#> #> Step: AIC=714#> Count ~ Year + Price + SearchRatio + `Slightly Against` + Neutral + #> `Slightly in Favour` + `In Favour`#> #> Df Sum of Sq RSS AIC#> - `Slightly in Favour` 1 78 69077 712#> - Year 1 463 69462 712#> <none> 68999 714#> - SearchRatio 1 1459 70459 714#> - Price 1 1624 70624 714#> - Neutral 1 6289 75289 721#> - `Slightly Against` 1 17073 86073 736#> - `In Favour` 1 66523 135523 785#> #> Step: AIC=712#> Count ~ Year + Price + SearchRatio + `Slightly Against` + Neutral + #> `In Favour`#> #> Df Sum of Sq RSS AIC#> - Year 1 478 69556 711#> <none> 69077 712#> - SearchRatio 1 1484 70561 712#> - Price 1 1696 70773 712#> - Neutral 1 6457 75534 719#> - `Slightly Against` 1 18225 87303 735#> - `In Favour` 1 66469 135547 783#> #> Step: AIC=711#> Count ~ Price + SearchRatio + `Slightly Against` + Neutral + #> `In Favour`#> #> Df Sum of Sq RSS AIC#> - Price 1 1257 70813 710#> <none> 69556 711#> - SearchRatio 1 3651 73207 714#> - Neutral 1 6957 76513 719#> - `Slightly Against` 1 17771 87326 733#> - `In Favour` 1 65991 135547 781#> #> Step: AIC=710#> Count ~ SearchRatio + `Slightly Against` + Neutral + `In Favour`#> #> Df Sum of Sq RSS AIC#> <none> 70813 710#> - SearchRatio 1 4023 74836 714#> - Neutral 1 6722 77535 718#> - `Slightly Against` 1 17769 88582 733#> - `In Favour` 1 65625 136438 779# View the summary of the reduced modelsummary(reduced_model)#> #> Call:#> lm(formula = Count ~ SearchRatio + `Slightly Against` + Neutral + #> `In Favour`, data = df_multi_reg)#> #> Residuals:#> Min 1Q Median 3Q Max #> -111.61 -10.32 -2.99 8.03 153.23 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -12.858 6.017 -2.14 0.0350 * #> SearchRatio 0.827 0.342 2.42 0.0173 * #> `Slightly Against` -10.300 2.026 -5.08 1.7e-06 ***#> Neutral -18.494 5.915 -3.13 0.0023 ** #> `In Favour` 20.121 2.059 9.77 2.4e-16 ***#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> Residual standard error: 26.2 on 103 degrees of freedom#> Multiple R-squared: 0.699, Adjusted R-squared: 0.687 #> F-statistic: 59.7 on 4 and 103 DF, p-value: <2e-16# Perform backward eliminationreduced_model <-step(full_model, direction ="backward")#> Start: AIC=715#> Count ~ Year + Price + SearchRatio + `Generation Z` + Millennials + #> `Generation X` + `Baby Boomers` + Against + `Slightly Against` + #> Neutral + `Slightly in Favour` + `In Favour`#> #> #> Step: AIC=715#> Count ~ Year + Price + SearchRatio + `Generation Z` + Millennials + #> `Generation X` + Against + `Slightly Against` + Neutral + #> `Slightly in Favour` + `In Favour`#> #> #> Step: AIC=715#> Count ~ Year + Price + SearchRatio + `Generation Z` + Millennials + #> Against + `Slightly Against` + Neutral + `Slightly in Favour` + #> `In Favour`#> #> #> Step: AIC=715#> Count ~ Year + Price + SearchRatio + `Generation Z` + Against + #> `Slightly Against` + Neutral + `Slightly in Favour` + `In Favour`#> #> #> Step: AIC=715#> Count ~ Year + Price + SearchRatio + Against + `Slightly Against` + #> Neutral + `Slightly in Favour` + `In Favour`#> #> Df Sum of Sq RSS AIC#> - Against 1 101 68999 714#> - `Slightly in Favour` 1 112 69010 714#> - Year 1 537 69435 714#> <none> 68898 715#> - SearchRatio 1 1555 70453 716#> - Price 1 1668 70566 716#> - Neutral 1 6034 74932 723#> - `Slightly Against` 1 17063 85961 737#> - `In Favour` 1 22929 91828 745#> #> Step: AIC=714#> Count ~ Year + Price + SearchRatio + `Slightly Against` + Neutral + #> `Slightly in Favour` + `In Favour`#> #> Df Sum of Sq RSS AIC#> - `Slightly in Favour` 1 78 69077 712#> - Year 1 463 69462 712#> <none> 68999 714#> - SearchRatio 1 1459 70459 714#> - Price 1 1624 70624 714#> - Neutral 1 6289 75289 721#> - `Slightly Against` 1 17073 86073 736#> - `In Favour` 1 66523 135523 785#> #> Step: AIC=712#> Count ~ Year + Price + SearchRatio + `Slightly Against` + Neutral + #> `In Favour`#> #> Df Sum of Sq RSS AIC#> - Year 1 478 69556 711#> <none> 69077 712#> - SearchRatio 1 1484 70561 712#> - Price 1 1696 70773 712#> - Neutral 1 6457 75534 719#> - `Slightly Against` 1 18225 87303 735#> - `In Favour` 1 66469 135547 783#> #> Step: AIC=711#> Count ~ Price + SearchRatio + `Slightly Against` + Neutral + #> `In Favour`#> #> Df Sum of Sq RSS AIC#> - Price 1 1257 70813 710#> <none> 69556 711#> - SearchRatio 1 3651 73207 714#> - Neutral 1 6957 76513 719#> - `Slightly Against` 1 17771 87326 733#> - `In Favour` 1 65991 135547 781#> #> Step: AIC=710#> Count ~ SearchRatio + `Slightly Against` + Neutral + `In Favour`#> #> Df Sum of Sq RSS AIC#> <none> 70813 710#> - SearchRatio 1 4023 74836 714#> - Neutral 1 6722 77535 718#> - `Slightly Against` 1 17769 88582 733#> - `In Favour` 1 65625 136438 779# View the summary of the reduced modelsummary(reduced_model)#> #> Call:#> lm(formula = Count ~ SearchRatio + `Slightly Against` + Neutral + #> `In Favour`, data = df_multi_reg)#> #> Residuals:#> Min 1Q Median 3Q Max #> -111.61 -10.32 -2.99 8.03 153.23 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -12.858 6.017 -2.14 0.0350 * #> SearchRatio 0.827 0.342 2.42 0.0173 * #> `Slightly Against` -10.300 2.026 -5.08 1.7e-06 ***#> Neutral -18.494 5.915 -3.13 0.0023 ** #> `In Favour` 20.121 2.059 9.77 2.4e-16 ***#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> Residual standard error: 26.2 on 103 degrees of freedom#> Multiple R-squared: 0.699, Adjusted R-squared: 0.687 #> F-statistic: 59.7 on 4 and 103 DF, p-value: <2e-16# Create a summary table for the linear modeltable_regression_mod2_multi <-tbl_regression(reduced_model)# Print the tabletable_regression_mod2_multi
Characteristic
Beta
95% CI1
p-value
SearchRatio
0.83
0.15, 1.5
0.017
Slightly Against
-10
-14, -6.3
<0.001
Neutral
-18
-30, -6.8
0.002
In Favour
20
16, 24
<0.001
1 CI = Confidence Interval
Starting Model: The initial model includes a wide range of predictor variables: Year, Price, SearchRatio, various generational groups (Generation Z, Millennials, Generation X, Baby Boomers), and several categories of opinions (Against, Slightly Against, Neutral, Slightly in Favour, In Favour).
Stepwise Process: The stepwise procedure iteratively removes the least significant variables based on their contribution to the model (using the Akaike Information Criterion, AIC, as a guide). A lower AIC indicates a better model fit with respect to the number of variables.
Model Refinement: As we progress through the steps, we observe the removal of several variables like Baby Boomers, Millennials, and Generation X. This suggests that these variables were not significantly contributing to the explanation of variations in Count.
Final Model: The last step shows the model with the variables: SearchRatio, Slightly Against, Neutral, and In Favour. This model has an AIC of 710, which is lower than the starting AIC of 715, indicating a more efficient model.
Significance of Remaining Variables: In the final model, all the variables are considered significant contributors. If any were to be removed, it would result in a higher AIC, indicating a less optimal model.
Interpretation of Variables:
SearchRatio, Slightly Against, Neutral, and In Favour are significant predictors for Count.
The absence of demographic variables in the final model suggests that the adoption count may not be strongly related to these demographic factors, or their effect is captured by other variables.
The presence of opinion-related variables (like In Favour) indicates a possible correlation between public opinion and the count of the dependent variable (possibly related to EV adoption or similar context).
Cautions: While stepwise regression is useful for variable selection, it can sometimes lead to overfitting or neglecting important variables that don’t show strong individual effects but are important in combination with others. Hence, the results should be interpreted with caution, and further analysis (like checking for multicollinearity, interactions between variables, etc.) is recommended to validate the findings.
Vizualisation of regression Quality
Code
# Set up the layout for a 2x2 plotpar(mfrow =c(2, 2))# Create the diagnostic plotsplot(full_model, 1, main ="Full Model Residuals vs Fitted")plot(full_model, 2, main ="Full Model Q-Q Plot")plot(full_model, 3, main ="Full Model Scale-Location Plot")plot(full_model, 5, main ="Full Model Residuals vs Leverage")# Reset to default layoutpar(mfrow =c(1, 1))
In the plot, most of the data points seem to be randomly scattered around the horizontal line at zero, which is good as it suggests that the residuals have constant variance and that the model’s predictions are unbiased. However, there are two data points labeled 74 and 76 that stand out from the rest, with one particularly far from zero. These points could be outliers or influential points that have a large effect on the regression line.
The decision to remove outliers should not be taken lightly. Simply removing outliers solely because they are outliers can lead to overfitting the model to the remaining data, and the results may not generalize well to new data. Instead, we should investigate why those data points are outliers:
Are they the result of data entry errors or measurement errors? If so, they should be corrected or removed.
Do they represent a segment of the population that is not well explained by the model? If that’s the case, you might need a more complex model.
Are they true extreme values that are part of the natural variation of the data? If they are valid observations, removing them could bias your results.
4.3 RQ2
In comparing regions in Switzerland, which areas show higher or lower adoption of electric vehicles, and how does this regional adoption align or vary with external factors like oil price changes, political opinions, and demographic shifts?
From the initial data:
In 2005, the EV adoption rate per capita was approximately 0.000004, which means there were about 4 EVs per million people. By 2009, this rate increased to approximately 18 EVs per million people.
This approach will give us a general sense of EV adoption in relation to the overall population but won’t provide regional demographic granularity.
The merged data now includes the total population for each year in Switzerland and the total count of electric vehicles (EVs) for those years. We have also calculated the EV adoption rate per capita, which gives us an insight into how EV adoption scales with the population size.
These figures show a growing trend in EV adoption in relation to the population size, albeit the numbers are still quite small relative to the total population.
The trend shows a gradual increase in EV adoption relative to the population size, indicating a growing acceptance and usage of electric vehicles in Switzerland during this period.
Code
df_demo <- df_demographicdf_ev <- df_v_electric# Convert Date and Year to Date typedf_ev$Date <-as.Date(df_ev$Date)df_demo$Year <-as.Date(df_demo$Year)# Summing up the population for each yeardf_demo$total_population <-rowSums(df_demo[,c("Generation Z", "Millennials", "Generation X", "Baby Boomers")])# Aggregating EV data by yeardf_ev_yearly <- df_ev %>%group_by(Year =as.Date(format(Date, "%Y-01-01"))) %>%summarize(total_ev =sum(Count))# Merging the datasetsmerged_data <-merge(df_ev_yearly, df_demo, by ="Year")# Calculating EV adoption per capitamerged_data$ev_per_capita <- merged_data$total_ev / merged_data$total_population
Code
# Creating a ggplot object with your datap <-ggplot(merged_data, aes(x = Year, y = ev_per_capita, group =1)) +geom_line(color ="darkblue", size =1) +labs(title ="EV Adoption Per Capita Over Time in Switzerland",x ="Year",y ="EV Adoption Per Capita")# Animate the plot with gganimate, revealing the line over timeanimated_plot <- p +transition_reveal(Year)# Render the animationanimate(animated_plot, renderer = gganimate::gifski_renderer(), width =600, height =400, res =96)
The correlation matrix below shows the relationships between the proportions of different generational groups (Generation Z, Millennials, Generation X, Baby Boomers) and the EV adoption rate per capita in Switzerland. The heatmap provides the following insights:
The correlation coefficients indicate the strength and direction of the relationship between each pair of variables. Positive values suggest a positive correlation (as one increases, so does the other), while negative values suggest an inverse relationship.
*babyboomers !
Code
# Data Preparationdf_ev$Date <-as.Date(df_ev$Date)df_demo$Year <-as.Date(df_demo$Year)# Summing up the population for each yeardf_demo$total_population <-rowSums(df_demo[,c("Generation Z", "Millennials", "Generation X", "Baby Boomers")])# Calculate proportionsdf_demo$prop_gen_z <- df_demo$`Generation Z`/ df_demo$total_populationdf_demo$prop_millennials <- df_demo$Millennials / df_demo$total_populationdf_demo$prop_gen_x <- df_demo$`Generation X`/ df_demo$total_populationdf_demo$prop_boomers <- df_demo$`Baby Boomers`/ df_demo$total_population# Aggregating EV data by yeardf_ev_yearly <- df_ev %>%group_by(Year =as.Date(format(Date, "%Y-01-01"))) %>%summarize(total_ev =sum(Count))# Merging the datasetsmerged_data <-merge(df_ev_yearly, df_demo, by ="Year")# Calculating EV adoption per capitamerged_data$ev_per_capita <- merged_data$total_ev / merged_data$total_population# Correlation Matrixcorrelation_matrix <-cor(merged_data[,c("prop_gen_z", "prop_millennials", "prop_gen_x", "prop_boomers", "ev_per_capita")])# Melting the correlation matrix for ggplotmelted_correlation_matrix <-melt(correlation_matrix)#> Warning in melt(correlation_matrix): The melt generic in data.table#> has been passed a matrix and will attempt to redirect to the#> relevant reshape2 method; please note that reshape2 is deprecated,#> and this redirection is now deprecated as well. To continue using#> melt methods from reshape2 while both libraries are attached, e.g.#> melt.list, you can prepend the namespace like#> reshape2::melt(correlation_matrix). In the next version, this#> warning will become an error.# Modify the variable names by removing 'prop_' and replacing '_' with ' 'melted_correlation_matrix$Var1 <-gsub("prop_", "", melted_correlation_matrix$Var1)melted_correlation_matrix$Var2 <-gsub("prop_", "", melted_correlation_matrix$Var2)melted_correlation_matrix$Var1 <-gsub("_", " ", melted_correlation_matrix$Var1)melted_correlation_matrix$Var2 <-gsub("_", " ", melted_correlation_matrix$Var2)# Creating the heatmapp <-ggplot(melted_correlation_matrix, aes(Var1, Var2, fill = value)) +geom_tile() +geom_text(aes(label =sprintf("%.2f", value)), color ="white", size =4) +scale_fill_gradient(low ="lightblue", high ="darkblue", name ="Correlation") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) +labs(x ="", y ="", title ="Correlation Heatmap")# Convert to interactive plotggplotly(p, tooltip =c("label", "fill"))
4.3 RQ3
How has the growth of electric vehicles evolved in comparison to other countries such as France, and what factors might account for the differences in their evolution ?
Code
# Filtering Swiss data for specific fuel typesswiss_specific_fuel <- df_v %>%filter(Fuel %in%c("Diesel", "Electricity", "Conventional hybrid", "Plug-in hybrid", "Petrol")) %>%filter(Location =='Switzerland') |>filter(VehicleType =='Passenger car') |>filter(Date >as.Date('2012-01-01')) |>filter(Date <as.Date('2021-12-31'))# Selecting equivalent columns from the French datasetfrench_specific_fuel <- df_v_fr %>%select(Date, Diesel_delta, Essence_delta, Conventional_Hybrid_delta, Plug_in_Hybrid_delta, Electrique_delta) # Adjust column names accordingly# Reshape French dataset to long format for easier plottingfrench_specific_fuel_long <- french_specific_fuel %>%pivot_longer(cols =-Date, names_to ="Fuel", values_to ="Count")# Standardize counts in each datasetswiss_specific_fuel <- swiss_specific_fuel %>%mutate(Count =scale(Count))french_specific_fuel_long <- french_specific_fuel_long %>%mutate(Count =scale(Count))# Rename the 'Fuel' column in the French datasetfrench_specific_fuel_long <- french_specific_fuel_long %>%mutate(Fuel =case_when( Fuel =="Diesel_delta"~"Diesel", Fuel =="Essence_delta"~"Petrol", Fuel =="Conventional_Hybrid_delta"~"Conventional hybrid", Fuel =="Plug_in_Hybrid_delta"~"Plug-in hybrid", Fuel =="Electrique_delta"~"Electricity" ))# Define color palette for each fuel typefuel_colors <-c("Diesel"="black", "Electricity"="green", "Conventional hybrid"="purple", "Plug-in hybrid"="blue", "Petrol"="orange")p <-ggplot() +geom_smooth(data = swiss_specific_fuel, aes(x = Date, y = Count, color = Fuel), method ="loess", se =FALSE, size =1.5) +geom_line(data = french_specific_fuel_long, aes(x = Date, y = Count, color = Fuel), alpha =0.4, size =0.8) +scale_color_manual(values = fuel_colors, labels =c("Diesel", "Electricity", "Conventional hybrid", "Plug-in hybrid", "Petrol"),breaks =c("Diesel", "Electricity", "Conventional hybrid", "Plug-in hybrid", "Petrol")) +labs(x ="Date", y ="Standardized Count", color ="Fuel Type") +theme_minimal() +geom_text(data =data.frame(x =as.Date("2021-01-01"), y =c(3, 2.8), label =c("Switzerland has", "the thickest line")), aes(x = x, y = y, label = label, color = label), size =4, show.legend =FALSE)interactive_plot <-ggplotly(p, width =600, height =400, tooltip =c("x", "y", "color"))interactive_plot <- interactive_plot %>%layout(legend =list(orientation ="h", x =0, xanchor ="left", y =-0.2))interactive_plot
Code
# Filtering Swiss data for specific fuel typesswiss_specific_fuel <- df_v %>%filter(Fuel %in%c("Electricity", "Conventional hybrid", "Plug-in hybrid")) %>%filter(Location =='Switzerland'& VehicleType =='Passenger car'& Date >as.Date('2012-01-01') & Date <as.Date('2021-12-31'))# Selecting equivalent columns from the French datasetfrench_specific_fuel <- df_v_fr %>%select(Date, Diesel_delta, Essence_delta, Conventional_Hybrid_delta, Plug_in_Hybrid_delta, Electrique_delta) # Adjust column names accordingly# Reshape French dataset to long format for easier plottingfrench_specific_fuel_long <- french_specific_fuel %>%pivot_longer(cols =-Date, names_to ="Fuel", values_to ="Count") %>%mutate(Fuel =case_when( Fuel =="Conventional_Hybrid_delta"~"Conventional hybrid", Fuel =="Plug_in_Hybrid_delta"~"Plug-in hybrid", Fuel =="Electrique_delta"~"Electricity" ))# Standardize counts in each datasetswiss_specific_fuel <- swiss_specific_fuel %>%mutate(Count =scale(Count))french_specific_fuel_long <- french_specific_fuel_long %>%filter(Fuel %in%c("Conventional hybrid", "Plug-in hybrid", "Electricity")) %>%mutate(Count =scale(Count))# Define color palette for each fuel typefuel_colors <-c("Conventional hybrid"="purple", "Plug-in hybrid"="blue", "Electricity"="green")# Plotting with faceting and improved axis textp <-ggplot() +geom_smooth(data = swiss_specific_fuel, aes(x = Date, y = Count, color = Fuel), method ="loess", se =FALSE, size =1.5) +geom_line(data = french_specific_fuel_long, aes(x = Date, y = Count, color = Fuel), alpha =0.4, size =0.8) +scale_color_manual(values = fuel_colors, labels =c("Switzerland", "France")) +labs(x ="Date", y ="Standardized Count") +theme_minimal() +theme(legend.position ="bottom",strip.background =element_blank(),strip.text.x =element_text(size =10, angle =0),axis.text.x =element_blank(),axis.text.y =element_blank(),axis.title.x =element_text(size =12, margin =margin(t =10)),axis.title.y =element_text(size =12, margin =margin(r =10))) +facet_wrap(~Fuel, scales ='free_y', ncol =1)# Convert to interactive plotinteractive_plot <-ggplotly(p, width =600, height =600, tooltip =c("x", "y", "color"))interactive_plot <- interactive_plot %>%layout(legend =list(orientation ="h", x =0, xanchor ="left", y =-0.2))interactive_plot
Switzerland has the thickest line
This compares the rate at which a certain category of fuel is adopted by the citizen of each country (Switzerland vs. France) and is normalized for each country’s size.
We can see that Electricity and Plug-in Hybrid follow roughly the same trajectory. This could be caused by the fact that they are used in a very similar way. In both these categories Switzerland has a slightly higher adoption rate.
For Conventional hybrid however, France seems to have a slightly faster adopting rate.
Code
data <- charge_ch_fr # Convert year to Date format and then extract the yeardata$year <-as.Date(paste0(data$year, "-01-01"))data$year <-format(data$year, "%Y")# Sum the values by year and regiondata_summarized <- data %>%group_by(year, region) %>%summarize(total_value =sum(value))# Create the ggplotp <-ggplot(data_summarized, aes(x = year, y = total_value, fill = region)) +geom_bar(stat ="identity", position ="dodge") +scale_fill_manual(values =c("France"="#4E79A7", "Switzerland"="#F28E2B")) +# Adjusted custom colorslabs(title ="Total Availability of Charging Stations in France vs Switzerland",x ="Year",y ="Total Charging Stations") +theme_minimal() +theme(plot.title =element_text(hjust =0.5), # Center the plot titlelegend.title =element_blank()) # Remove the legend title# Convert to interactive plot using plotlyp_interactive <-ggplotly(p)p_interactive <-ggplotly(p, width =600, height =600, tooltip =c("x", "y", "color"))p_interactive <- p_interactive %>%layout(legend =list(orientation ="h", x =0, xanchor ="left", y =-0.2))p_interactive
4.5 RQ5
To what extent does the evolution in the availability of charging stations exert an influence on the adoption of electric vehicles in Switzerland? ::: {.cell layout-align=“center”}
Code
# First, let's merge the df_v and df_charge_number_CH data sets, and we will look at Fuel: Electricitydf_v_electric_total_ch <- df_v %>%filter(Fuel =="Electricity", VehicleType =="Passenger car", Location ==c("Switzerland","Confederation")) %>%select(Date, Count)sum_by_year <- df_v_electric_total_ch %>%group_by(Year = lubridate::year(Date)) %>%summarise(Total_Count =sum(Count))# Convert year to a common format for mergingsum_by_year <- sum_by_year %>%mutate(year =as.Date(paste0(Year, "-01-01")))# Merge the datasets based on the "year" columnmerged_v_charge <-left_join(sum_by_year, df_charge_number_CH, by =c("year"="year"))# cleaning merged data setmerged_v_charge <- merged_v_charge %>%filter(Year >"2011") %>%select(Year, Total_Count, powertrain, value)names(merged_v_charge)[names(merged_v_charge) =="Total_count"] <-"EVs"colnames(merged_v_charge)[colnames(merged_v_charge) =="value"] <-"Charging station"# Summing Powertrain togethermerged_v_charge <- merged_v_charge %>%group_by(Year, Total_Count) %>%summarise(Count =sum(`Charging station`))# Checking the correlationcorr_charge_ev <-cor(merged_v_charge$Total_Count, merged_v_charge$Count)# their correlation is 0.957, almost perfectly correlated (no suprise here)# Checking for lagged correlationlags_to_explore <-1:3lagged_correlation <-function(data, lag) { data %>%mutate(Count_Lagged =lag(Count, n = lag, default =NA)) %>%summarise(Correlation =cor(Total_Count, Count_Lagged, use ="complete.obs"))}# Calculate lagged correlations for each laglagged_correlations_df <-data.frame(Lag = lags_to_explore) %>%rowwise() %>%mutate(Correlation =lagged_correlation(merged_v_charge[, -1], Lag)$Correlation)# Print the resultsprint("Original Correlation:")#> [1] "Original Correlation:"print(corr_charge_ev)#> [1] 0.957print("Lagged Correlations:")#> [1] "Lagged Correlations:"print(lagged_correlations_df)#> # A tibble: 3 x 2#> # Rowwise: #> Lag Correlation#> <int> <dbl>#> 1 1 0.943#> 2 2 0.910#> 3 3 0.885# Now we formulate the following Hypothesis# H0: new charging station increase EV adoption vs. H1: new charging station does not increase EV adoption# Check these hypotheses with a simple linear regressionlinear_charging <-lm(Total_Count ~ Count, data = merged_v_charge)# Poisson Testpoisson_model <-glm(Total_Count ~ Count, family = poisson, data = merged_v_charge)# Set up the layout using mfrowpar(mfrow =c(1, 2)) # 1 row, 2 columns# Plotting for Simple Linear Regressionplot(linear_charging, 1, main ="LM Residuals vs Fitted")plot(linear_charging, 2, main ="LM Normal Q-Q Plot")plot(linear_charging, 3, main ="LM Scale-Location Plot")plot(linear_charging, 5, main ="LM Residuals vs Leverage")# Plotting for Poisson Regressionplot(poisson_model, which =1, main ="Poisson Residuals vs Fitted")plot(poisson_model, which =2, main ="Poisson Normal Q-Q Plot")plot(poisson_model, which =3, main ="Poisson Scale-Location Plot")plot(poisson_model, which =5, main ="Poisson Residuals vs Leverage")# Printingsummary(linear_charging)#> #> Call:#> lm(formula = Total_Count ~ Count, data = merged_v_charge)#> #> Residuals:#> Min 1Q Median 3Q Max #> -6059 -2966 -1298 3627 4916 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -5340.399 2122.178 -2.52 0.033 * #> Count 3.341 0.338 9.88 3.9e-06 ***#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> Residual standard error: 4150 on 9 degrees of freedom#> Multiple R-squared: 0.916, Adjusted R-squared: 0.906 #> F-statistic: 97.7 on 1 and 9 DF, p-value: 3.94e-06summary(poisson_model)#> #> Call:#> glm(formula = Total_Count ~ Count, family = poisson, data = merged_v_charge)#> #> Coefficients:#> Estimate Std. Error z value Pr(>|z|) #> (Intercept) 7.46e+00 7.67e-03 973 <2e-16 ***#> Count 2.68e-04 7.91e-07 339 <2e-16 ***#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> (Dispersion parameter for poisson family taken to be 1)#> #> Null deviance: 139780.3 on 10 degrees of freedom#> Residual deviance: 6337.1 on 9 degrees of freedom#> AIC: 6457#> #> Number of Fisher Scoring iterations: 4# Reset the layoutpar(mfrow =c(1, 1))
::: The year-on-year correlation is the highest, the lagged correlation diminishes. We could suggest hat the availability of charging stations and the new registration of EVs go hand-in-hand, and that the availability of new charging station does not create a demand of new EVs by itself. Correlation does not imply causation, while we see a relationship, we can’t conclude that charging stations directly cause changes in electric vehicle adoption only with a Correlation analysis
With both a linear regression and a Poisson-test. We find evidence of statistically significant relationship between the count of available charging station and the count of electric vehicles registered. We have Prediction variable/coefficient of 2.68 x 10^-4 and 3.34 respectively. And a p-val < 0.005 for both. However, it is important to remind ourselves that these variables have a bidirectional / mutual influence, beyond the scope of what our analysis shows. The relationship is not strictly unidirectional and therefore, it is hard to conclude anything without further domain-knowledge and context-specific information